Re: [PATCH 2/2] KVM: x86/mmu: KVM: x86/mmu: Skip unsync when large pages are allowed

From: Sean Christopherson

Date: Thu Mar 12 2026 - 13:08:44 EST

On Fri, Jan 23, 2026, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@xxxxxxxxxxxx>
>
> Use the large-page metadata to avoid pointless attempts to search SP.
>
> If the target GFN falls within a range where a large page is allowed,
> then there cannot be a shadow page for that GFN; a shadow page in the
> range would itself disallow using a large page. In that case, there
> is nothing to unsync and mmu_try_to_unsync_pages() can return
> immediately.
>
> This is always true for TDP MMU without nested TDP,

I wouldn't expect this to be a much of a performance optimization for this case
though, as kvm_get_mmu_page_hash() will return an empty list, i.e.
for_each_gfn_valid_sp_with_gptes() won't do meaningful work anyways.

> and holds for a significant fraction of cases with shadow paging even all SPs
> are 4K.
>
> For shadow paging, this optimization theoretically avoids work for about
> 1/e ~= 37% of GFNs, assuming one guest page table per 2M of memory and
> that each GPT falls randomly into the 2M memory buckets. In a simple
> test setup, it skipped unsync in a much higher percentage of cases,
> mainly because the guest buddy allocator clusters GPTs into fewer
> buckets.
>
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@xxxxxxxxxxxx>
> ---
> arch/x86/kvm/mmu/mmu.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4535d2836004..555075fb63d9 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2932,6 +2932,14 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> struct kvm_mmu_page *sp;
> bool locked = false;
>
> + /*
> + * If large page is allowed, there is no shadow page in the GFN range,
> + * because the presence of a shadow page in that range would prevent
> + * using a large page.
> + */
> + if (!lpage_info_slot(gfn, slot, PG_LEVEL_2M)->disallow_lpage)
> + return 0;

Hmm, I'd like to move this to after the write-tracking check, even though as
implemented in code today, the two are mutually exclusive. Specifically, I don't
want to rely on KVM not supporting write-tracking at 2MiB granularity, and also
to avoid confusing readers. E.g. a shallow read of account_shadowed() would lead
people to believe this code is wrong:

/* the non-leaf shadow pages are keeping readonly. */
if (sp->role.level > PG_LEVEL_4K)
return __kvm_write_track_add_gfn(kvm, slot, gfn);

kvm_mmu_gfn_disallow_lpage(slot, gfn);

if they didn't follow __kvm_write_track_add_gfn() to see:

/*
* new track stops large page mapping for the
* tracked page.
*/
kvm_mmu_gfn_disallow_lpage(slot, gfn);

>From a performance perspective, kvm_gfn_is_write_tracked() is O(1) time, and
should be very fast for the "pure" TDP MMU case, so I don't think that's a
concern.

This is what I have locally, please holler if you object to landing the code
after the write-tracked check.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 363967a17069..3d0e0c1b5332 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2940,6 +2940,15 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
if (kvm_gfn_is_write_tracked(kvm, slot, gfn))
return -EPERM;

+ /*
+ * Only 4KiB mappings can become unsync, and KVM disallows hugepages
+ * for unsync gfns. Upper-level gPTEs (leaf or non-leaf) are always
+ * write-protected (see above), thus if the gfn can be mapped with a
+ * hugepage and isn't write-tracked, it can't be unsync.
+ */
+ if (!lpage_info_slot(gfn, slot, PG_LEVEL_2M)->disallow_lpage)
+ return 0;
+
/*
* The page is not write-tracked, mark existing shadow pages unsync
* unless KVM is synchronizing an unsync SP. In that case, KVM must

> /*
> * Force write-protection if the page is being tracked. Note, the page
> * track machinery is used to write-protect upper-level shadow pages,
> --
> 2.19.1.6.gb485710b
>