Re: [PATCH Part2 RFC v4 33/40] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT

From: Brijesh Singh
Date: Mon Jul 19 2021 - 10:19:43 EST




On 7/16/21 4:00 PM, Sean Christopherson wrote:
On Wed, Jul 07, 2021, Brijesh Singh wrote:
+static int __snp_handle_psc(struct kvm_vcpu *vcpu, int op, gpa_t gpa, int level)

I can live with e.g. GHCB_MSR_PSC_REQ, but I'd strongly prefer to spell this out,
e.g. __snp_handle_page_state_change() or whatever. I had a hell of a time figuring
out what PSC was the first time I saw it in some random context.

Based on the previous review feedback I renamed from __snp_handle_page_state_change to __snp_handle_psc(). I will see what others say and based on that will rename accordingly.


+{
+ struct kvm *kvm = vcpu->kvm;
+ int rc, tdp_level;
+ kvm_pfn_t pfn;
+ gpa_t gpa_end;
+
+ gpa_end = gpa + page_level_size(level);
+
+ while (gpa < gpa_end) {
+ /*
+ * Get the pfn and level for the gpa from the nested page table.
+ *
+ * If the TDP walk failed, then its safe to say that we don't have a valid
+ * mapping for the gpa in the nested page table. Create a fault to map the
+ * page is nested page table.
+ */
+ if (!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &tdp_level)) {
+ pfn = kvm_mmu_map_tdp_page(vcpu, gpa, PFERR_USER_MASK, level);
+ if (is_error_noslot_pfn(pfn))
+ goto out;
+
+ if (!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &tdp_level))
+ goto out;
+ }
+
+ /* Adjust the level so that we don't go higher than the backing page level */
+ level = min_t(size_t, level, tdp_level);
+
+ write_lock(&kvm->mmu_lock);

Retrieving the PFN and level outside of mmu_lock is not correct. Because the
pages are pinned and the VMM is not malicious, it will function as intended, but
it is far from correct.


Good point, I should have retrieved the pfn and level inside the lock.

The overall approach also feels wrong, e.g. a guest won't be able to convert a
2mb chunk back to a 2mb large page if KVM mapped the GPA as a 4kb page in the
past (from a different conversion).


Maybe I am missing something, I am not able to follow 'guest won't be able to convert a 2mb chunk back to a 2mb large page'. The page-size used inside the guest have to relationship with the RMP/NPT page-size. e.g, a guest can validate the page range as a 4k and still map the page range as a 2mb or 1gb in its pagetable.


I'd also strongly prefer to have a common flow between SNP and TDX for converting
between shared/prviate.

I'll circle back to this next week, it'll probably take a few hours of staring
to figure out a solution, if a common one for SNP+TDX is even possible.


Sounds good.

+
+ switch (op) {
+ case SNP_PAGE_STATE_SHARED:
+ rc = snp_make_page_shared(vcpu, gpa, pfn, level);
+ break;
+ case SNP_PAGE_STATE_PRIVATE:
+ rc = snp_make_page_private(vcpu, gpa, pfn, level);
+ break;
+ default:
+ rc = -EINVAL;
+ break;
+ }
+
+ write_unlock(&kvm->mmu_lock);
+
+ if (rc) {
+ pr_err_ratelimited("Error op %d gpa %llx pfn %llx level %d rc %d\n",
+ op, gpa, pfn, level, rc);
+ goto out;
+ }
+
+ gpa = gpa + page_level_size(level);
+ }
+
+out:
+ return rc;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -2941,6 +3063,25 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PSC_REQ: {
+ gfn_t gfn;
+ int ret;
+ u8 op;
+
+ gfn = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_GFN_MASK, GHCB_MSR_PSC_GFN_POS);
+ op = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_OP_MASK, GHCB_MSR_PSC_OP_POS);
+
+ ret = __snp_handle_psc(vcpu, op, gfn_to_gpa(gfn), PG_LEVEL_4K);
+
+ /* If failed to change the state then spec requires to return all F's */

That doesn't mesh with what I could find:

o 0x015 – SNP Page State Change Response
▪ GHCBData[63:32] – Error code
▪ GHCBData[31:12] – Reserved, must be zero
Written by the hypervisor in response to a Page State Change request. Any non-
zero value for the error code indicates that the page state change was not
successful.

And if "all Fs" is indeed the error code, 'int ret' probably only works by luck
since the return value is a 64-bit value, where as ret is a 32-bit signed int.

+ if (ret)
+ ret = -1;

Uh, this is fubar. You've created a shadow of 'ret', i.e. the outer ret is likely
uninitialized.


Ah, let me fix it in next rev.

thanks