Re: [PATCH Part2 RFC v4 33/40] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
From: Sean Christopherson
Date: Mon Jul 19 2021 - 15:44:02 EST
On Mon, Jul 19, 2021, Brijesh Singh wrote:
>
> On 7/16/21 4:00 PM, Sean Christopherson wrote:
> > On Wed, Jul 07, 2021, Brijesh Singh wrote:
> > > +static int __snp_handle_psc(struct kvm_vcpu *vcpu, int op, gpa_t gpa, int level)
> >
> > I can live with e.g. GHCB_MSR_PSC_REQ, but I'd strongly prefer to spell this out,
> > e.g. __snp_handle_page_state_change() or whatever. I had a hell of a time figuring
> > out what PSC was the first time I saw it in some random context.
>
> Based on the previous review feedback I renamed from
> __snp_handle_page_state_change to __snp_handle_psc(). I will see what others
> say and based on that will rename accordingly.
I've no objection to using PSC for enums and whatnot, and I'll happily defer to
Boris for functions in the core kernel and guest, but for KVM I'd really like to
spell out the name for the two or so main handler functions.
> > > + while (gpa < gpa_end) {
> > > + /*
> > > + * Get the pfn and level for the gpa from the nested page table.
> > > + *
> > > + * If the TDP walk failed, then its safe to say that we don't have a valid
> > > + * mapping for the gpa in the nested page table. Create a fault to map the
> > > + * page is nested page table.
> > > + */
> > > + if (!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &tdp_level)) {
> > > + pfn = kvm_mmu_map_tdp_page(vcpu, gpa, PFERR_USER_MASK, level);
> > > + if (is_error_noslot_pfn(pfn))
> > > + goto out;
> > > +
> > > + if (!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &tdp_level))
> > > + goto out;
> > > + }
> > > +
> > > + /* Adjust the level so that we don't go higher than the backing page level */
> > > + level = min_t(size_t, level, tdp_level);
> > > +
> > > + write_lock(&kvm->mmu_lock);
> >
> > Retrieving the PFN and level outside of mmu_lock is not correct. Because the
> > pages are pinned and the VMM is not malicious, it will function as intended, but
> > it is far from correct.
>
> Good point, I should have retrieved the pfn and level inside the lock.
>
> > The overall approach also feels wrong, e.g. a guest won't be able to convert a
> > 2mb chunk back to a 2mb large page if KVM mapped the GPA as a 4kb page in the
> > past (from a different conversion).
> >
>
> Maybe I am missing something, I am not able to follow 'guest won't be able
> to convert a 2mb chunk back to a 2mb large page'. The page-size used inside
> the guest have to relationship with the RMP/NPT page-size. e.g, a guest can
> validate the page range as a 4k and still map the page range as a 2mb or 1gb
> in its pagetable.
The proposed code walks KVM's TDP and adjusts the RMP level to be the min of the
guest+host levels. Once KVM has installed a 4kb TDP SPTE, that walk will find
the 4kb TDP SPTE and thus operate on the RMP at a 4kb granularity. To allow full
restoration of 2mb PTE+SPTE+RMP, KVM needs to zap the 4kb SPTE(s) at some point
to allow rebuilding a 2mb SPTE.