Re: [PATCH] KVM: x86/mmu: fix potential races when walking host page table

From: Sean Christopherson
Date: Fri Apr 29 2022 - 11:47:38 EST

On Fri, Apr 29, 2022, Paolo Bonzini wrote:
> On 4/29/22 16:35, Sean Christopherson wrote:
> > On Fri, Apr 29, 2022, Paolo Bonzini wrote:
> > > > +out:
> > > > + local_irq_restore(flags);
> > > > + return level;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(kvm_lookup_address_level_in_mm);
> > >
> > > Exporting is not needed.
> > >
> > > Thanks for writing the walk code though. I'll adapt it and integrate the
> > > patch.
> >
> > But why are we fixing this only in KVM? I liked the idea of stealing perf's
> > implementation because it was a seemlingly perfect fit and wouldn't introduce
> > new code (ignoring wrappers, etc...).
> >
> > We _know_ that at least one subsystem is misusing lookup_address_in_pgd() and
> > given that its wrappers are exported, I highly doubt KVM is the only offender.
> > It really feels like we're passing the buck here by burying the fix in KVM.
> There are two ways to do it:
> * having a generic function in mm/. The main issue there is the lack of a
> PG_LEVEL_{P4D,PUD,PMD,PTE} enum at the mm/ level. We could use (ctz(x) -
> 12) / 9 to go from size to level, but it's ugly and there could be
> architectures with heterogeneous page table sizes.
> * having a generic function in arch/x86/. In this case KVM seems to be the
> odd one that doesn't need the PTE. For example vc_slow_virt_to_phys needs
> the PTE, and needs the size rather than the "level" per se.
> So for now I punted, while keeping open the door for moving code from
> arch/x86/kvm/ to mm/ if anyone else (even other KVM ports) need the same
> logic.

Ugh. I was going to say that KVM is the only in-tree user that's subtly broken,
but then I saw vc_slow_virt_to_phys()... So there's at least one other use case
for walking user addresses and being able to tolerate a not-present mapping.

There are no other users of lookup_address_in_mm(), and other than SEV-ES's
dastardly use of lookup_address_in_pgd(), pgd can only ever come from init_mm or
efi_mm, i.e. can't work with user address anyways.

If we go the KVM-only route, can send the below revert along with it? The least
we can do is not give others an easy way to screw up.

Until I saw the #VC crud, I was hoping we could also explicitly prevent using
lookup_address_in_pgd() with user addresses. If/when #VC is fixed, we can/should
add this:

@@ -592,6 +592,15 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,

*level = PG_LEVEL_NONE;

+ /*
+ * The below walk does not guard against user page tables being torn
+ * down, attempting to walk a user address is dangerous and likely to
+ * explode sooner or later. This helper is intended only for use with
+ * kernel-only mm_structs, e.g. init_mm and efi_mm.
+ */
+ if (WARN_ON_ONCE(address < TASK_SIZE_MAX))
+ return NULL;

From: Sean Christopherson <seanjc@xxxxxxxxxx>
Date: Fri, 29 Apr 2022 07:57:53 -0700
Subject: [PATCH] Revert "x86/mm: Introduce lookup_address_in_mm()"

Drop lookup_address_in_mm() now that KVM is providing it's own variant
of lookup_address_in_pgd() that is safe for use with user addresses, e.g.
guards against page tables being torn down. A variant that provides a
non-init mm is inherently dangerous and flawed, as the only reason to use
an mm other than init_mm is to walk a userspace mapping, and
lookup_address_in_pgd() does not play nice with userspace mappings, e.g.
doesn't disable IRQs to block TLB shootdowns and doesn't use READ_ONCE()
to ensure an upper level entry isn't converted to a huge page between
checking the PAGE_SIZE bit and grabbing the address of the next level

This reverts commit 13c72c060f1ba6f4eddd7b1c4f52a8aded43d6d9.

Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
arch/x86/include/asm/pgtable_types.h | 4 ----
arch/x86/mm/pat/set_memory.c | 11 -----------
2 files changed, 15 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 40497a9020c6..407084d9fd99 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -559,10 +559,6 @@ static inline void update_page_count(int level, unsigned long pages) { }
extern pte_t *lookup_address(unsigned long address, unsigned int *level);
extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
unsigned int *level);
-struct mm_struct;
-extern pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
- unsigned int *level);
extern pmd_t *lookup_pmd_address(unsigned long address);
extern phys_addr_t slow_virt_to_phys(void *__address);
extern int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn,
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index abf5ed76e4b7..0656db33574d 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -638,17 +638,6 @@ pte_t *lookup_address(unsigned long address, unsigned int *level)

- * Lookup the page table entry for a virtual address in a given mm. Return a
- * pointer to the entry and the level of the mapping.
- */
-pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
- unsigned int *level)
- return lookup_address_in_pgd(pgd_offset(mm, address), address, level);
static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address,
unsigned int *level)

base-commit: 6f363ed2fa4c24c400acc29b659c96e4dc7930e8