Re: [PATCH v2] KVM: x86/mmu: Expose number of shadow MMU shadow pages as a stat

From: Yosry Ahmed

Date: Mon Jun 15 2026 - 19:58:24 EST


On Mon, Jun 15, 2026 at 4:46 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Mon, Jun 15, 2026, Yosry Ahmed wrote:
> > On Fri, Jun 12, 2026 at 6:37 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > Turn arch.n_used_mmu_pages into a stat, mmu_shadow_pages, as the number of
> > > live shadow pages is arguably _the_ most critical datapoint when it comes
> > > to analyzing the shadow MMU. Before the TDP MMU came along, i.e. when the
> > > shadow MMU was the only MMU, explicitly tracking the number of shadow pages
> > > wasn't as interesting, because the same information could more or less be
> > > gleaned from the pages_{1g,2m,4k} stats. But with the TDP MMU, where the
> > > shadow MMU is only used for nested TDP, it becomes extremely difficult, if
> > > not impossible, to determine which SPTEs are coming from the TDP MMU, and
> > > which are coming from the shadow MMU.
> > >
> > > E.g. when triaging/debugging shadow MMU performance issues due to "too many
> > > shadow pages", being able to observe that 99%+ of all shadow pages are
> > > unsync is critical to being able to deduce that KVM is effectively leaking
> > > shadow pages.
> >
> > Why not expose indirect_shadow_pages? IIRC that was also one of the
> > stats we (mostly you) used while debugging?
>
> Because it's a subset of mmu_shadow_pages, and I suspect mmu_shadow_pages will
> be more helpful if we're only providing one of the two? E.g. if the problem is
> that KVM is leaking indirect shadow pages, then either number will suffice. But
> if KVM is zapping old SPTEs due to the KVM_SET_NR_MMU_PAGES limit, then we really
> want to see mmu_shadow_pages, otherwise there will be a blind spot with respect
> to direct shadow pages. And if there's bug that's specific to direct shadow pages,
> then we're probably hosed either way, because it will be difficult to observe just
> the direct shadow pages (unless they happen to be the _only_ pages, which is very
> unlikely, but then we'd still want mmu_shadow_pages,).
>
> In practice, thanks to the TDP MMU deliberately _not_ accounting its pages as
> shadow pages, the delta between the two values will be tiny on setups with TDP
> enabled, i.e. on practically every modern deployment. Because hypervisor page
> tables are typically tree-like, and hugepages are, well, huge, the number of
> direct shadow pages in indirect MMUs will be counted in tens or hundreds, out
> thousands or tens of thousands of total shadow pages.

Yeah if we're choosing one, I think mmu_shadow_pages is more valuable.
What do we lose if we make both of them stats tho?

>
> > I guess for most cases, mmu_shadow_pages will represent either the MMU
> > pages used to shadow the VM's x86 page tables (with TDP off) or nested
> > TDP MMU pages (with TDP on and nested used) -- but I do remember some
> > interesting case about direct mappings in the shadow MMU or sth?
>
> Yes, there can direct shadow pages in an indirect mmu (guest is using a page size
> that is larger than the host, in which case there are no gPTEs to shadow and thus
> the gva=>gpa / l2_gpa=>l1_gpa translations in KVM's shadow pages are "direct").

I never understood why these are "direct" tho. Sure, they do not
directly correspond gPTEs, but they still are shadowing guest
mappings. IOW, if the guest has a PMD mapping and KVM has some PTE
mappings, aren't those PTE mappings still shadowing the PMD mapping,
and still need to be sync'd in the same way they would if they were
shadowing PTEs? The main difference I can think of is the 1:n
relationship?