Re: [PATCH v19 059/130] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases

From: Isaku Yamahata
Date: Mon Mar 25 2024 - 19:11:10 EST


On Mon, Mar 25, 2024 at 03:18:36PM -0700,
Isaku Yamahata <isaku.yamahata@xxxxxxxxx> wrote:

> On Mon, Mar 25, 2024 at 07:55:04PM +0000,
> "Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> wrote:
>
> > On Mon, 2024-03-25 at 12:05 -0700, Isaku Yamahata wrote:
> > > Right, the guest has to accept it on VE.  If the unmap was intentional by guest,
> > > that's fine.  The unmap is unintentional (with vMTRR), the guest doesn't expect
> > > VE with the GPA.
> > >
> > >
> > > > But, I guess we should punt to userspace is the guest tries to use
> > > > MTRRs, not that userspace can handle it happening in a TD...  But it
> > > > seems cleaner and safer then skipping zapping some pages inside the
> > > > zapping code.
> > > >
> > > > I'm still not sure if I understand the intention and constraints fully.
> > > > So please correct. This (the skipping the zapping for some operations)
> > > > is a theoretical correctness issue right? It doesn't resolve a TD
> > > > crash?
> > >
> > > For lapic, it's safe guard. Because TDX KVM disables APICv with
> > > APICV_INHIBIT_REASON_TDX, apicv won't call kvm_zap_gfn_range().
> > Ah, I see it:
> > https://lore.kernel.org/lkml/38e2f8a77e89301534d82325946eb74db3e47815.1708933498.git.isaku.yamahata@xxxxxxxxx/
> >
> > Then it seems a warning would be more appropriate if we are worried there might be a way to still
> > call it. If we are confident it can't, then we can just ignore this case.
> >
> > >
> > > For MTRR, the purpose is to make the guest boot (without the guest kernel
> > > command line like clearcpuid=mtrr) .
> > > If we can assume the guest won't touch MTRR registers somehow, KVM can return an
> > > error to TDG.VP.VMCALL<RDMSR, WRMSR>(MTRR registers). So it doesn't call
> > > kvm_zap_gfn_range(). Or we can use KVM_EXIT_X86_{RDMSR, WRMSR} as you suggested.
> >
> > My understanding is that Sean prefers to exit to userspace when KVM can't handle something, versus
> > making up behavior that keeps known guests alive. So I would think we should change this patch to
> > only be about not using the zapping roots optimization. Then a separate patch should exit to
> > userspace on attempt to use MTRRs. And we ignore the APIC one.
> >
> > This is trying to guess what maintainers would want here. I'm less sure what Paolo prefers.
>
> When we hit KVM_MSR_FILTER, the current implementation ignores it and makes it
> error to guest. Surely we should make it KVM_EXIT_X86_{RDMSR, WRMSR}, instead.
> It's aligns with the existing implementation(default VM and SW-protected) and
> more flexible.

Something like this for "112/130 KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall"
Compile only tested at this point.

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f891de30a2dd..4d9ae5743e24 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1388,31 +1388,67 @@ static int tdx_emulate_mmio(struct kvm_vcpu *vcpu)
return 1;
}

+static int tdx_complete_rdmsr(struct kvm_vcpu *vcpu)
+{
+ if (vcpu->run->msr.error)
+ tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND);
+ else {
+ tdvmcall_set_return_code(vcpu, TDVMCALL_SUCCESS);
+ tdvmcall_set_return_val(vcpu, vcpu->run->msr.data);
+ }
+ return 1;
+}
+
static int tdx_emulate_rdmsr(struct kvm_vcpu *vcpu)
{
u32 index = tdvmcall_a0_read(vcpu);
u64 data;

- if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ) ||
- kvm_get_msr(vcpu, index, &data)) {
+ if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ)) {
+ trace_kvm_msr_read_ex(index);
+ tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND);
+ return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_RDMSR, 0,
+ tdx_complete_rdmsr,
+ KVM_MSR_RET_FILTERED);
+ }
+
+ if (kvm_get_msr(vcpu, index, &data)) {
trace_kvm_msr_read_ex(index);
tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND);
return 1;
}
- trace_kvm_msr_read(index, data);

+ trace_kvm_msr_read(index, data);
tdvmcall_set_return_code(vcpu, TDVMCALL_SUCCESS);
tdvmcall_set_return_val(vcpu, data);
return 1;
}

+static int tdx_complete_wrmsr(struct kvm_vcpu *vcpu)
+{
+ if (vcpu->run->msr.error)
+ tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND);
+ else
+ tdvmcall_set_return_code(vcpu, TDVMCALL_SUCCESS);
+ return 1;
+}
+
static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
{
u32 index = tdvmcall_a0_read(vcpu);
u64 data = tdvmcall_a1_read(vcpu);

- if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE) ||
- kvm_set_msr(vcpu, index, data)) {
+ if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE)) {
+ trace_kvm_msr_write_ex(index, data);
+ tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND);
+ if (kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_WRMSR, data,
+ tdx_complete_wrmsr,
+ KVM_MSR_RET_FILTERED))
+ return 1;
+ return 0;
+ }
+
+ if (kvm_set_msr(vcpu, index, data)) {
trace_kvm_msr_write_ex(index, data);
tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND);
return 1;


--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>