Re: [PATCH v7 15/17] KVM: selftests: Allocate a dedicated guest page for x86 L2 guest stack
From: Sean Christopherson
Date: Thu May 28 2026 - 14:03:58 EST
On Thu, May 28, 2026, Yosry Ahmed wrote:
> On Wed, May 27, 2026 at 7:56 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Wed, May 27, 2026, Yosry Ahmed wrote:
> > > Instead of relying on the L1-provided stack for L2, which is usually an
> > > array on L1's own stack, allocate a dedicated page of VM memory for the
> > > L2 stack in vcpu_alloc_{vmx/svm}() and use that as L2's RSP in the
> > > VMCS/VMCB instead of the L1-provided value.
> > >
> > > Most L1 guest code does not do anything with the L2 stack other than
> > > stuff it in RSP, so this change is transparent and the L1-provided stack
> > > is silently ignored. The only exception is memstress nested L1 code
> > > which puts the vCPU index on L2's stack, so update this code to use the
> > > newly allocated stack.
> > >
> > > L1-provided stacks will be dropped and cleaned up separately.
> > >
> > > Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > > Signed-off-by: Yosry Ahmed <yosry@xxxxxxxxxx>
> > > ---
> >
> > Blech. This exposed a nasty edge in selftests. For tests that enable TDP, the
> > slots need to be identity mapped *after* allocating SVM/VMX. Found out the hard
> > way: the gPAT test starting failing.
> >
> > Not worth worrying about right now, just one more wart in selftests that needs
> > to be cleaned up.
>
> I would squash that diff below to the gPAT test rather than this
> patch, but up to you.
Ya, that's the plan.
> Generally speaking, the tdp_identity_map_default_memslots() API sucks.
> It has to be called after all mappings/allocations are done, which is
> soooo user unfriendly and when it breaks it takes time to figure out
> what's going on.
>
> I think it should go away. Ideally either:
> (a) When TDP is enabled in the VM, tdp_identity_map_default_memslots()
> is called, and then every time we create a new mapping in the VM we
> mirror it in TDP page tables.
>
> (b) Every time we update guest mappings, we mark the TDP page tables
> as "out of sync", and we "resync" on vCPU run as needed.
>
> Either way, it is not as straightforward as just calling
> tdp_identity_map_default_memslots(), because the current code reuses
> the same logic as stage-1 mappings, and it fails if there's already a
> PTE. So consecutive calls will fail if they try to map the whole thing
> -- unless we free all TDP page tables first.
>
> I don't think I have time to spend on this right now, mainly
> documenting my thoughts, and putting this out here in case you (or
> anyone else) has enough hatred for the API to go do this before I get
> the time to.
We're on the same page, this probably doesn't even crack my Top 10 list of things
I want to fix in selftests. Ok, it's probably in the Top 10, but not the Top 5.
I responded on-list mostly to point out the problem in case someone else happens
to have in-flight code that will be affected.