Re: [PATCH] KVM: vmx/nested: avoid blindly setting SECONDARY_EXEC_ENCLS_EXITING when sgx is enabled
From: Emanuele Giuseppe Esposito
Date: Tue Oct 25 2022 - 08:36:47 EST
Am 24/10/2022 um 18:52 schrieb Sean Christopherson:
> "nVMX" instead of "vmx/nested"
>
> On Mon, Oct 24, 2022, Emanuele Giuseppe Esposito wrote:
>> Currently vmx enables SECONDARY_EXEC_ENCLS_EXITING even when sgx
>> is not set in the host MSR.
>> This was probably introduced when sgx was not yet fully supported, and
>> we wanted to trap guests trying to use the feature.
>
> Nah, it's just a boneheaded bug.
>
>> When booting a guest, KVM checks that the cpuid bit is actually set
>> in vmx.c, and if not, it does not enable the feature.
>
> The CPUID thing is a red herring. That's an _additional_ restriction, KVM honors
> MSR_IA32_VMX_PROCBASED_CTLS2 when configuring vmcs01. See adjust_vmx_controls()
> for secondary controls in setup_vmcs_config().
>
>> However, in nesting this control bit is blindly copied, and will be
>
> It's not "copied", KVM sets the bit in the nVMX MSR irrespective of host support,
> which is the problem.
>
>> propagated to VMCS12 and VMCS02. Therefore, when L1 tries to boot
>> the guest, the host will try to execute VMLOAD with VMCS02 containing
>> a feature that the hardware does not support, making it fail with
>> hardware error 0x7.
>>
>> According with section A.3.3 of Intel System Programming Guide,
>> we should *always* check the value in the actual
>
> s/we/software
>
>> MSR_IA32_VMX_PROCBASED_CTLS2 before enabling this bit.
>>
>> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2127128
>>
>
> Fixes: 72add915fbd5 ("KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC")
> Cc: stable@xxxxxxxxxxxxxxx
>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@xxxxxxxxxx>
>> ---
>> arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++++++++--
>> 1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 8f67a9c4a287..f651084010cc 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -6678,6 +6678,25 @@ static u64 nested_vmx_calc_vmcs_enum_msr(void)
>> return (u64)max_idx << VMCS_FIELD_INDEX_SHIFT;
>> }
>>
>> +/*
>> + * According with section A.3.3 of
>
> Avoid referencing sections/tables by "number" in comments (and changelogs), as the
> comment is all but guaranteed to become stale because future versions of the SDM
> will shift things around.
>
> The slightly more robust way to reference a specific SDM/APM section is to use the
> title, e.g. According to section "Secondary Processor-Based VM-Execution Controls"
> in Intel's SDM ...", as hardware vendors are less likely to arbitrarily rename
> sections and tables. It's a bit more work for readers, but any decent PDF viewer
> can search these days.
>
>> Intel System Programming Guide
>
> KVM typically uses "Intel's SDM" (and "AMD's APM"). Like "VMX" or "SVM", it's ok
> to use the SDM acronym without introducing since "SDM" is
>
>> + * we *can* set the guest MSR control X (in our case
>
> Avoid pronouns in comments. "we" and "us" are ambiguous, e.g. "we" can mean KVM,
> the developer, the user, etc...
>
>> + * SECONDARY_EXEC_ENCLS_EXITING) *iff* bit 32+X of
>> + * MSR_IA32_VMX_PROCBASED_CTLS2 is set to 1.
>> + * Otherwise it must remain zero.
>
> As a general rule, if you find yourself writing a comment and a helper for
> something that KVM absolutely needs to get right (honoring VMX MSRs), then odds
> are very good that there's a simpler/easier fix, i.e. that you're effectively
> re-inventing part of the weel.
>
>> + */
>> +static void nested_vmx_setup_encls_exiting(struct nested_vmx_msrs *msrs)
>> +{
>> + u32 vmx_msr_procb_low, vmx_msr_procb_high;
>> +
>> + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, vmx_msr_procb_low, vmx_msr_procb_high);
>> +
>> + WARN_ON(vmx_msr_procb_low & SECONDARY_EXEC_ENCLS_EXITING);
>> +
>> + if (enable_sgx && (vmx_msr_procb_high & SECONDARY_EXEC_ENCLS_EXITING))
>> + msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING;
>> +}
>> +
>> /*
>> * nested_vmx_setup_ctls_msrs() sets up variables containing the values to be
>> * returned for the various VMX controls MSRs when nested VMX is enabled.
>> @@ -6874,8 +6893,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
>> msrs->secondary_ctls_high |=
>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>>
>> - if (enable_sgx)
>> - msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING;
>
> The issue here is that I, who wrote this code long, long ago, copied the pattern
> used for enable_unrestricted_guest, flexpriority_enabled, etc... without fully
> appreciating the logic. Unlike those module params, enable_sgx doesn't track
> hardware support, i.e. enable_sgx isn't cleared when SGX can't be enabled due to
> lack of hardware support. As a result, KVM effectively enumerates to L1 that the
> control is always available, i.e. that KVM emulates ENCLS-exiting for L1, but KVM
> obviously doesn't actually emulating the behavior.
>
> Not updating enable_sgx is responsible for a second bug: vmx_set_cpu_caps() doesn't
> clear the SGX bits when hardware support is unavailable. This is a much less
> problematic bug as as it only pops up if SGX is soft-disabled (the case being
> handled by cpu_has_sgx()) or if SGX is supported for bare metal but not in the
> VMCS (will never happen when running on bare metal, but can theoertically happen
> when running in a VM).
>
> Last but not least, KVM should ideally have module params reflect KVM's actual
> configuration.
>
> Killing all birds with one stone, simply clear enable_sgx when ENCLS-exiting isn't
> supported. The #ifdef is a little gross, but I think it's marginally less ugly
> than having vmx.c define a dummy boolean.
>
> Compile tested only...
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9dba04b6b019..65f092e4a81b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8263,6 +8263,11 @@ static __init int hardware_setup(void)
> if (!cpu_has_virtual_nmis())
> enable_vnmi = 0;
>
> +#ifdef CONFIG_X86_SGX_KVM
> + if (!cpu_has_vmx_encls_vmexit())
> + enable_sgx = false;
> +#endif
> +
> /*
> * set_apic_access_page_addr() is used to reload apic access
> * page upon invalidation. No need to do anything if not
>
Thanks for all the suggestions and explanations, I am going to apply
your changes and send v2!
Emanuele