Re: [RFC PATCH v3 53/59] KVM: x86: Add a helper function to restore 4 host MSRs on exit to user space

From: Chao Gao
Date: Fri Nov 26 2021 - 04:11:29 EST


On Thu, Nov 25, 2021 at 09:34:59PM +0100, Thomas Gleixner wrote:
>On Wed, Nov 24 2021 at 16:20, isaku yamahata wrote:
>> From: Chao Gao <chao.gao@xxxxxxxxx>
>
>> $Subject: KVM: x86: Add a helper function to restore 4 host MSRs on exit to user space
>
>Which user space are you talking about? This subject line is misleading

Host Ring3.

>at best. The unconditional reset is happening when a TDX VM exits
>because the SEAM firmware enforces this to prevent unformation leaks.

Yes.

>
>It also does not matter whether this are four or ten MSR.

Indeed, the number of MSRs doesn't matter.

>Fact is that
>the SEAM firmware is buggy because it does not save/restore those MSRs.

It is done deliberately. It gives host a chance to do "lazy" restoration.
"lazy" means don't save/restore them on each TD entry/exit but defer
restoration to when it is neccesary e.g., when vCPU is scheduled out or
when kernel is about to return to Ring3.

>
>So the proper subject line is:
>
> KVM: x86: Add infrastructure to handle MSR corruption by broken TDX firmware

I rewrote the commit message:

KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr

Several MSRs are constant and only used in userspace. But VMs may have
different values. KVM uses kvm_set_user_return_msr() to switch to guest's
values and leverages user return notifier to restore them when kernel is
to return to userspace. In order to save unnecessary wrmsr, KVM also caches
the value it wrote to a MSR last time.

TDX module unconditionally resets some of these MSRs to architectural INIT
state on TD exit. It makes the cached values in kvm_user_return_msrs are
inconsistent with values in hardware. This inconsistency needs to be fixed
otherwise, it may mislead kvm_on_user_return() to skip restoring some MSRs
to host's values. kvm_set_user_return_msr() can help to correct this case
but it is not optimal as it always does a wrmsr. So, introduce a variation
of kvm_set_user_return_msr() to update the cached value but skip the wrmsr.

>
>> The TDX module unconditionally reset 4 host MSRs (MSR_SYSCALL_MASK,
>> MSR_START, MSR_LSTAR, MSR_TSC_AUX) to architectural INIT state on exit from
>> TDX VM to KVM. KVM needs to save their values before TD enter and restore
>> them on exit to userspace.
>>
>> Reuse current kvm_user_return mechanism and introduce a function to update
>> cached values and register the user return notifier in this new function.
>>
>> The later patch will use the helper function to save/restore 4 host
>> MSRs.
>
>'The later patch ...' is useless information. Of course there will be a
>later patch to make use of this which is implied by 'Add infrastructure
>...'. Can we please get rid of these useless phrases which have no value
>at patch submission time and are even more confusing once the pile is
>merged?

Of course. Will remove all "later patch" phrases.

Thanks
Chao

>
>Thanks,
>
> tglx