Re: [PATCH] KVM: x86: use again the flush argument of __link_shadow_page()

From: Paolo Bonzini

Date: Tue May 05 2026 - 07:52:32 EST


On Mon, May 4, 2026 at 8:36 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Mon, May 04, 2026, Sean Christopherson wrote:
> > x86/mmu
> >
> > On Sun, May 03, 2026, Paolo Bonzini wrote:
> > > Except in the case of parentless nested-TDP pages, mmu_page_zap_pte()
> > > clears the SPTE but leaves the invalid_list empty. In this case, using
> > > kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill.
> > > Avoid flushing the entirety of the remote TLBs unless the invalid_list
> > > was populated: instead, use a more efficient gfn-targeting flush (if
> > > available) and skip it altogether if the caller guarantees that a TLB
> > > flush is not necessary.
> > >
> > > Based-on: <20260503201029.106481-1-pbonzini@xxxxxxxxxx>
> > > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > > ---
> > > arch/x86/kvm/mmu/mmu.c | 6 ++++--
> > > 1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 892246204435..85bec8eeace8 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -2541,8 +2541,10 @@ static void __link_shadow_page(struct kvm *kvm,
> > > parent_sp = sptep_to_sp(sptep);
> > > WARN_ON_ONCE(parent_sp->role.level == PG_LEVEL_4K);
> > >
> > > - mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list);
> > > - kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, true);
> > > + if (mmu_page_zap_pte(kvm, parent_sp, sptep, &invalid_list))
> > > + kvm_mmu_commit_zap_page(kvm, &invalid_list);
> > > + else if (flush)
> > > + kvm_flush_remote_tlbs_sptep(kvm, sptep);
> >
> > Duh, this is obvious in hindsight.
> >
> > Reviewed-by: Sean Christopherson <seanjc@xxxxxxxxxx>
>
> An amendment to that: I thought this was just switching back to the more targeted
> range-based flushed, I didn't realize you applied the version that hardcoded the
> @flush param to kvm_mmu_remote_flush_or_zap() with "true".

Yes, I was hoping to simplify stable backports a bit. Here is my
version of your comment (yes, I did try adding batched zapping to
shadow_mmu_split_huge_page() and failed):

/*
* Note: while normally KVM uses a "bool flush" return value to let
* the caller batch flushes, __link_shadow_page() flushes immediately
* immediately before populating the parent PTE with the new shadow page.
* The typical callers, direct_map() and FNAME(fetch)(), are not going
* to zap more than one large SPTE anyway.
*
* The only exception, where @flush can be false, is when a large SPTE
* is replaced with a large SPTE with a fully populated page table,
* which can happen from shadow_mmu_split_huge_page(). In this case,
* no memory is unmapped across the change to the page tables and no
* immediate flush is needed for correctness.
*
* Even in that case, calls to kvm_mmu_commit_zap_page() are not
* batched. Doing so would require adding an invalid_list argument
* all the way down to __walk_slot_rmaps().
*/

What do you think?

Paolo