Re: [PATCH v19 059/130] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases
From: Chao Gao
Date: Thu Mar 28 2024 - 09:39:45 EST
On Thu, Mar 28, 2024 at 09:21:37PM +0800, Xiaoyao Li wrote:
>On 3/28/2024 6:17 PM, Chao Gao wrote:
>> On Thu, Mar 28, 2024 at 11:40:27AM +0800, Xiaoyao Li wrote:
>> > On 3/28/2024 11:04 AM, Edgecombe, Rick P wrote:
>> > > On Thu, 2024-03-28 at 09:30 +0800, Xiaoyao Li wrote:
>> > > > > The current ABI of KVM_EXIT_X86_RDMSR when TDs are created is nothing. So I don't see how this
>> > > > > is
>> > > > > any kind of ABI break. If you agree we shouldn't try to support MTRRs, do you have a different
>> > > > > exit
>> > > > > reason or behavior in mind?
>> > > >
>> > > > Just return error on TDVMCALL of RDMSR/WRMSR on TD's access of MTRR MSRs.
>> > >
>> > > MTRR appears to be configured to be type "Fixed" in the TDX module. So the guest could expect to be
>> > > able to use it and be surprised by a #GP.
>> > >
>> > > {
>> > > "MSB": "12",
>> > > "LSB": "12",
>> > > "Field Size": "1",
>> > > "Field Name": "MTRR",
>> > > "Configuration Details": null,
>> > > "Bit or Field Virtualization Type": "Fixed",
>> > > "Virtualization Details": "0x1"
>> > > },
>> > >
>> > > If KVM does not support MTRRs in TDX, then it has to return the error somewhere or pretend to
>> > > support it (do nothing but not return an error). Returning an error to the guest would be making up
>> > > arch behavior, and to a lesser degree so would ignoring the WRMSR.
>> >
>> > The root cause is that it's a bad design of TDX to make MTRR fixed1. When
>> > guest reads MTRR CPUID as 1 while getting #VE on MTRR MSRs, it already breaks
>> > the architectural behavior. (MAC faces the similar issue , MCA is fixed1 as
>>
>> I won't say #VE on MTRR MSRs breaks anything. Writes to other MSRs (e.g.
>> TSC_DEADLINE MSR) also lead to #VE. If KVM can emulate the MSR accesses, #VE
>> should be fine.
>>
>> The problem is: MTRR CPUID feature is fixed 1 while KVM/QEMU doesn't know how
>> to virtualize MTRR especially given that KVM cannot control the memory type in
>> secure-EPT entries.
>
>yes, I partly agree on that "#VE on MTRR MSRs breaks anything". #VE is not a
>problem, the problem is if the #VE is opt-in or unconditional.
>From guest's p.o.v, there is no difference: the guest doesn't know whether a feature
is opted in or not.
>
>For the TSC_DEADLINE_MSR, #VE is opt-in actually.
>CPUID(1).EXC[24].TSC_DEADLINE is configurable by VMM. Only when VMM
>configures the bit to 1, will the TD guest get #VE. If VMM configures it to
>0, TD guest just gets #GP. This is the reasonable design.
>
>> > well while accessing MCA related MSRs gets #VE. This is why TDX is going to
>> > fix them by introducing new feature and make them configurable)
>> >
>> > > So that is why I lean towards
>> > > returning to userspace and giving the VMM the option to ignore it, return an error to the guest or
>> > > show an error to the user.
>> >
>> > "show an error to the user" doesn't help at all. Because user cannot fix it,
>> > nor does QEMU.
>>
>> The key point isn't who can fix/emulate MTRR MSRs. It is just KVM doesn't know
>> how to handle this situation and ask userspace for help.
>>
>> Whether or how userspace can handle the MSR writes isn't KVM's problem. It may be
>> better if KVM can tell userspace exactly in which cases KVM will exit to
>> userspace. But there is no such an infrastructure.
>>
>> An example is: in KVM CET series, we find it is complex for KVM instruction
>> emulator to emulate control flow instructions when CET is enabled. The
>> suggestion is also to punt to userspace (w/o any indication to userspace that
>> KVM would do this).
>
>Please point me to decision of CET? I'm interested in how userspace can help
>on that.
https://lore.kernel.org/kvm/ZZgsipXoXTKyvCZT@xxxxxxxxxx/
>
>> >
>> > > If KVM can't support the behavior, better to get an actual error in
>> > > userspace than a mysterious guest hang, right?
>> > What behavior do you mean?
>> >
>> > > Outside of what kind of exit it is, do you object to the general plan to punt to userspace?
>> > >
>> > > Since this is a TDX specific limitation, I guess there is KVM_EXIT_TDX_VMCALL as a general category
>> > > of TDVMCALLs that cannot be handled by KVM.
>>
>> Using KVM_EXIT_TDX_VMCALL looks fine.
>>
>> We need to explain why MTRR MSRs are handled in this way unlike other MSRs.
>>
>> It is better if KVM can tell userspace that MTRR virtualization isn't supported
>> by KVM for TDs. Then userspace should resolve the conflict between KVM and TDX
>> module on MTRR. But to report MTRR as unsupported, we need to make
>> GET_SUPPORTED_CPUID a vm-scope ioctl. I am not sure if it is worth the effort.
>
>My memory is that Sean dislike the vm-scope GET_SUPPORTED_CPUID for TDX when
>he was at Intel.
Ok. No strong opinion on this.