Re: [PATCH 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL
From: Babu Moger
Date: Thu Dec 10 2020 - 18:01:08 EST
On 12/10/20 3:36 PM, Jim Mattson wrote:
> On Thu, Dec 10, 2020 at 1:26 PM Babu Moger <babu.moger@xxxxxxx> wrote:
>>
>> Hi Jim,
>>
>>> -----Original Message-----
>>> From: Jim Mattson <jmattson@xxxxxxxxxx>
>>> Sent: Monday, December 7, 2020 5:06 PM
>>> To: Moger, Babu <Babu.Moger@xxxxxxx>
>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>; Thomas Gleixner
>>> <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; Borislav Petkov
>>> <bp@xxxxxxxxx>; Yu, Fenghua <fenghua.yu@xxxxxxxxx>; Tony Luck
>>> <tony.luck@xxxxxxxxx>; Wanpeng Li <wanpengli@xxxxxxxxxxx>; kvm list
>>> <kvm@xxxxxxxxxxxxxxx>; Lendacky, Thomas <Thomas.Lendacky@xxxxxxx>;
>>> Peter Zijlstra <peterz@xxxxxxxxxxxxx>; Sean Christopherson
>>> <seanjc@xxxxxxxxxx>; Joerg Roedel <joro@xxxxxxxxxx>; the arch/x86
>>> maintainers <x86@xxxxxxxxxx>; kyung.min.park@xxxxxxxxx; LKML <linux-
>>> kernel@xxxxxxxxxxxxxxx>; Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx>; H .
>>> Peter Anvin <hpa@xxxxxxxxx>; mgross@xxxxxxxxxxxxxxx; Vitaly Kuznetsov
>>> <vkuznets@xxxxxxxxxx>; Phillips, Kim <kim.phillips@xxxxxxx>; Huang2, Wei
>>> <Wei.Huang2@xxxxxxx>
>>> Subject: Re: [PATCH 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL
>>>
>>> On Mon, Dec 7, 2020 at 2:38 PM Babu Moger <babu.moger@xxxxxxx> wrote:
>>>>
>>>> Newer AMD processors have a feature to virtualize the use of the
>>>> SPEC_CTRL MSR. When supported, the SPEC_CTRL MSR is automatically
>>>> virtualized and no longer requires hypervisor intervention.
>>>>
>>>> This feature is detected via CPUID function 0x8000000A_EDX[20]:
>>>> GuestSpecCtrl.
>>>>
>>>> Hypervisors are not required to enable this feature since it is
>>>> automatically enabled on processors that support it.
>>>>
>>>> When this feature is enabled, the hypervisor no longer has to
>>>> intercept the usage of the SPEC_CTRL MSR and no longer is required to
>>>> save and restore the guest SPEC_CTRL setting when switching
>>>> hypervisor/guest modes. The effective SPEC_CTRL setting is the guest
>>>> SPEC_CTRL setting or'ed with the hypervisor SPEC_CTRL setting. This
>>>> allows the hypervisor to ensure a minimum SPEC_CTRL if desired.
>>>>
>>>> This support also fixes an issue where a guest may sometimes see an
>>>> inconsistent value for the SPEC_CTRL MSR on processors that support
>>>> this feature. With the current SPEC_CTRL support, the first write to
>>>> SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
>>>> MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
>>>> will be 0x0, instead of the actual expected value. There isn’t a
>>>> security concern here, because the host SPEC_CTRL value is or’ed with
>>>> the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
>>>> KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
>>>> MSR just before the VMRUN, so it will always have the actual value
>>>> even though it doesn’t appear that way in the guest. The guest will
>>>> only see the proper value for the SPEC_CTRL register if the guest was
>>>> to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
>>>> support, the MSR interception of SPEC_CTRL is disabled during
>>>> vmcb_init, so this will no longer be an issue.
>>>>
>>>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
>>>> ---
>>>
>>> Shouldn't there be some code to initialize a new "guest SPEC_CTRL"
>>> value in the VMCB, both at vCPU creation, and at virtual processor reset?
>>
>> Yes, I think so. I will check on this.
>>
>>>
>>>> arch/x86/kvm/svm/svm.c | 17 ++++++++++++++---
>>>> 1 file changed, 14 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index
>>>> 79b3a564f1c9..3d73ec0cdb87 100644
>>>> --- a/arch/x86/kvm/svm/svm.c
>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>> @@ -1230,6 +1230,14 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>>
>>>> svm_check_invpcid(svm);
>>>>
>>>> + /*
>>>> + * If the host supports V_SPEC_CTRL then disable the interception
>>>> + * of MSR_IA32_SPEC_CTRL.
>>>> + */
>>>> + if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>>>> + set_msr_interception(&svm->vcpu, svm->msrpm,
>>> MSR_IA32_SPEC_CTRL,
>>>> + 1, 1);
>>>> +
>>>> if (kvm_vcpu_apicv_active(&svm->vcpu))
>>>> avic_init_vmcb(svm);
>>>>
>>>> @@ -3590,7 +3598,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct
>>> kvm_vcpu *vcpu)
>>>> * is no need to worry about the conditional branch over the wrmsr
>>>> * being speculatively taken.
>>>> */
>>>> - x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
>>>> + if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>>>> + x86_spec_ctrl_set_guest(svm->spec_ctrl,
>>>> + svm->virt_spec_ctrl);
>>>
>>> Is this correct for the nested case? Presumably, there is now a "guest
>>> SPEC_CTRL" value somewhere in the VMCB. If L1 does not intercept this MSR,
>>> then we need to transfer the "guest SPEC_CTRL" value from the
>>> vmcb01 to the vmcb02, don't we?
>>
>> Here is the text from to be published documentation.
>> "When in host mode, the host SPEC_CTRL value is in effect and writes
>> update only the host version of SPEC_CTRL. On a VMRUN, the processor loads
>> the guest version of SPEC_CTRL from the VMCB. For non- SNP enabled guests,
>> processor behavior is controlled by the logical OR of the two registers.
>> When the guest writes SPEC_CTRL, only the guest version is updated. On a
>> VMEXIT, the guest version is saved into the VMCB and the processor returns
>> to only using the host SPEC_CTRL for speculation control. The guest
>> SPEC_CTRL is located at offset 0x2E0 in the VMCB." This offset is into
>> the save area of the VMCB (i.e. 0x400 + 0x2E0).
>>
>> The feature X86_FEATURE_V_SPEC_CTRL will not be advertised to guests.
>> So, the guest will use the same mechanism as today where it will save and
>> restore the value into/from svm->spec_ctrl. If the value saved in the VMSA
>> is left untouched, both an L1 and L2 guest will get the proper value.
>> Thing that matters is the initial setup of vmcb01 and vmcb02 when this
>> feature is available in host(bare metal). I am going to investigate that
>> part. Do you still think I am missing something here?
>
> It doesn't matter whether X86_FEATURE_V_SPEC_CTRL is advertised to L1
> or not. If L1 doesn't virtualize MSR_SPEC_CTRL for L2, then L1 and L2
> share the same value for that MSR. With this change, the current value
> in vmcb01 is only in vmcb01, and doesn't get propagated anywhere else.
> Hence, if L1 changes the value of MSR_SPEC_CTRL, that change is not
> visible to L2.
>
> Thinking about what Sean said about live migration, I think the
> correct solution here is that the authoritative value for this MSR
> should continue to live in svm->spec_ctrl. When the CPU supports
> X86_FEATURE_V_SPEC_CTRL, we should just transfer the value into the
> VMCB prior to VMRUN and out of the VMCB after #VMEXIT.
Ok. Got it. I will try this approach. Thanks for the suggestion.
>
>>
>>>
>>>> svm_vcpu_enter_exit(vcpu, svm);
>>>>
>>>> @@ -3609,12 +3618,14 @@ static __no_kcsan fastpath_t
>>> svm_vcpu_run(struct kvm_vcpu *vcpu)
>>>> * If the L02 MSR bitmap does not intercept the MSR, then we need to
>>>> * save it.
>>>> */
>>>> - if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
>>>> + if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL) &&
>>>> + unlikely(!msr_write_intercepted(vcpu,
>>>> + MSR_IA32_SPEC_CTRL)))
>>>> svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
>>>
>>> Is this correct for the nested case? If L1 does not intercept this MSR, then it
>>> might have changed while L2 is running. Presumably, the hardware has stored
>>> the new value somewhere in the vmcb02 at #VMEXIT, but now we need to move
>>> that value into the vmcb01, don't we?
>>>
>>>> reload_tss(vcpu);
>>>>
>>>> - x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl);
>>>> + if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>>>> + x86_spec_ctrl_restore_host(svm->spec_ctrl,
>>>> + svm->virt_spec_ctrl);
>>>>
>>>> vcpu->arch.cr2 = svm->vmcb->save.cr2;
>>>> vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
>>>>
>>>
>>> It would be great if you could add some tests to kvm-unit-tests.
>>
>> Yes. I will check on this part.
>>
>> Thanks
>> Babu