Re: [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs

From: Yan Zhao
Date: Thu May 11 2023 - 23:23:47 EST

> > Hi Sean,
> > After more thoughts, do you think checking KVM internal memslot is necessary?
> I don't think it's necessary per se, but I also can't think of any reason to allow
> it.
> > slot = gfn_to_memslot(kvm, gfn);
> > if (!slot || slot->id >= KVM_USER_MEM_SLOTS) {
> > srcu_read_unlock(&kvm->srcu, idx);
> > return -EINVAL;
> > }
> >
> > Do we allow write tracking to APIC access page when APIC-write VM exit
> > is not desired?
> Allow? Yes.
> But KVM doesn't use write-tracking for anything APICv related, e.g. to disable
> APICv, KVM instead zaps the SPTEs for the APIC access page and on page fault goes
> straight to MMIO emulation.
> Theoretically, the guest could create an intermediate PTE in the APIC access page
> and AFAICT KVM would shadow the access and write-protect the APIC access page.
> But that's benign as the resulting emulation would be handled just like emulated
> FWIW, the other internal memslots, TSS and idenity mapped page tables, are used
> if and only if paging is disabled in the guest, i.e. there are no guest PTEs for
> KVM to shadow (and paging must be enabled to enable VMX, so nested EPT is also
> ruled out). So this is theoretically possible only for the APIC access page.
> That changes with KVMGT, but that again should not be problematic. KVM will
> emulate in response to the write-protected page and things go on. E.g. it's
> arguably much weirder that the guest can read/write the identity mapped page
> tables that are used for EPT without unrestricted guest.
> There's no sane reason to allow creating PTEs in the APIC page, but I'm also not
> all that motivated to "fix" things. account_shadowed() isn't expected to fail,
> so KVM would need to check further up the stack, e.g. in walk_addr_generic() by
> open coding a form of kvm_vcpu_gfn_to_hva_prot().
> I _think_ that's the only place KVM would need to add a check, as KVM already
> checks that the root, i.e. CR3, is in a "visible" memslot. I suppose KVM could
> just synthesize triple fault, like it does for the root/CR3 case, but I don't
> like making up behavior.
> In other words, I'm not opposed to disallowing write-tracking internal memslots,
> but I can't think of anything that will break, and so for me personally at least,
> the ROI isn't sufficient to justify writing tests and dealing with any fallout.

It makes sense. Thanks for the explanation.