Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU

From: Paolo Bonzini
Date: Tue Jan 09 2018 - 06:08:00 EST


On 08/01/2018 21:00, Liran Alon wrote:
>
>
> On 08/01/18 20:08, Paolo Bonzini wrote:
>> From: Tom Lendacky <thomas.lendacky@xxxxxxx>
>>
>> Set IBPB (Indirect Branch Prediction Barrier) when the current CPU is
>> going to run a VCPU different from what was previously run. Nested
>> virtualization uses the same VMCB for the second level guest, but the
>> L1 hypervisor should be using IBRS to protect itself.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
>> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> ---
>> Â arch/x86/kvm/svm.c | 31 +++++++++++++++++++++++++++++++
>> Â 1 file changed, 31 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 779981a00d01..dd744d896cec 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -289,6 +289,7 @@ struct amd_svm_iommu_ir {
>> Â module_param(vgif, int, 0444);
>>
>> Â static bool __read_mostly have_spec_ctrl;
>> +static bool __read_mostly have_ibpb_support;
>>
>> Â static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
>> Â static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
>> @@ -540,6 +541,7 @@ struct svm_cpu_data {
>> ÂÂÂÂÂ struct kvm_ldttss_desc *tss_desc;
>>
>> ÂÂÂÂÂ struct page *save_area;
>> +ÂÂÂ struct vmcb *current_vmcb;
>> Â };
>>
>> Â static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
>> @@ -1151,6 +1153,11 @@ static __init int svm_hardware_setup(void)
>> ÂÂÂÂÂÂÂÂÂ pr_info("kvm: SPEC_CTRL available\n");
>> ÂÂÂÂÂ else
>> ÂÂÂÂÂÂÂÂÂ pr_info("kvm: SPEC_CTRL not available\n");
>> +ÂÂÂ have_ibpb_support = have_spec_ctrl || cpu_has_ibpb_support();
>> +ÂÂÂ if (have_ibpb_support)
>> +ÂÂÂÂÂÂÂ pr_info("kvm: IBPB_SUPPORT available\n");
>> +ÂÂÂ else
>> +ÂÂÂÂÂÂÂ pr_info("kvm: IBPB_SUPPORT not available\n");
>>
>> ÂÂÂÂÂ return 0;
>>
>> @@ -1725,11 +1732,19 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>> ÂÂÂÂÂ __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
>> ÂÂÂÂÂ kvm_vcpu_uninit(vcpu);
>> ÂÂÂÂÂ kmem_cache_free(kvm_vcpu_cache, svm);
>> +
>> +ÂÂÂ /*
>> +ÂÂÂÂ * The VMCB could be recycled, causing a false negative in
>> +ÂÂÂÂ * svm_vcpu_load; block speculative execution.
>> +ÂÂÂÂ */
>> +ÂÂÂ if (have_ibpb_support)
>> +ÂÂÂÂÂÂÂ wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
>> Â }
>>
>> Â static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>> Â {
>> ÂÂÂÂÂ struct vcpu_svm *svm = to_svm(vcpu);
>> +ÂÂÂ struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
>> ÂÂÂÂÂ int i;
>>
>> ÂÂÂÂÂ if (unlikely(cpu != vcpu->cpu)) {
>> @@ -1758,6 +1773,12 @@ static void svm_vcpu_load(struct kvm_vcpu
>> *vcpu, int cpu)
>> ÂÂÂÂÂ if (static_cpu_has(X86_FEATURE_RDTSCP))
>> ÂÂÂÂÂÂÂÂÂ wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>>
>> +ÂÂÂ if (sd->current_vmcb != svm->vmcb) {
>> +ÂÂÂÂÂÂÂ sd->current_vmcb = svm->vmcb;
>> +ÂÂÂÂÂÂÂ if (have_ibpb_support)
>> +ÂÂÂÂÂÂÂÂÂÂÂ wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
>> +ÂÂÂ }
>> +
>> ÂÂÂÂÂ avic_vcpu_load(vcpu, cpu);
>> Â }
>>
>> @@ -2798,6 +2819,11 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
>> ÂÂÂÂÂ if (!nested_vmcb)
>> ÂÂÂÂÂÂÂÂÂ return 1;
>>
>> +ÂÂÂ /*
>> +ÂÂÂÂ * No need for IBPB here, the L1 hypervisor should be running with
>> +ÂÂÂÂ * IBRS=1 and inserts one already when switching L2 VMs.
>> +ÂÂÂÂ */
>> +
>
> I am not fully convinced yet this is OK from security perspective.
> From the CPU point-of-view, both L1 & L2 run in the same prediction-mode
> (as they are both running in SVM guest-mode). Therefore, IBRS will not
> hide the BHB & BTB of L1 from L2.

Indeed, IBRS will not hide the indirect branch predictor state of L2
user mode from L1 user mode.

On current generation processors, as I understand it, IBRS simply
disables the indirect branch predictor when set to 1. Therefore, as
long as the L1 hypervisor sets IBRS=1, the indirect branch predictor
state left by L2 does not affect execution of the L1 hypervisor.

Future processors might have a mode that says "just set IBRS=1 and I'll
DTRT". If that's done by indexing the BTB using the full virtual
address, the CPL _and_ the ASID/PCID/VPID, then IBPB is not needed here
because the L2 VM uses a different ASID. If that's done by only
augmenting the BTB index with the CPL, we'd need an IBPB. But in this
new world we've been thrown into, that would be a processor bug...

Note that AMD currently doesn't implement IBRS at all. In order to
mitigate against CVE-2017-5715, the hypervisor should issue an IBPB on
every vmexit (in both L0 and L1). However, the cost of that is very
high. While we may want a paranoia mode that does it, it is outside the
scope of this series (which is more of a "let's unblock QEMU patches"
thing than a complete fix).

> 6. One can implicitly assume that L1 hypervisor did issued a IBPB when
> loading an VMCB and therefore it doesn't have to do it again in L0.
> However, I see 2 problems with this approach:
> (a) We don't protect old non-patched L1 hypervisors from Spectre even
> though we could easily do that from L0 with small performance hit.

Yeah, that's a nice thing to have. However, I disagree on the "small"
performance hit... on a patched hypervisor, the cost of a double fix can
be very noticeable.

> (b) When L1 hypervisor runs only a single L2 VM, it doesn't have to
> issue an IBPB when loading the VMCB (as it didn't run any other VMCB
> before) and it should be sufficient from L1 perspective to just write 1
> to IBRS. However, in our nested-case, this is a security-hole.

This is the main difference in our reasoning. I think IBRS=1 (or IBPB
on vmexit) should be enough.

Paolo

> Am I misunderstanding something?
>
> Regards,
> -Liran
>
>> ÂÂÂÂÂ /* Exit Guest-Mode */
>> ÂÂÂÂÂ leave_guest_mode(&svm->vcpu);
>> ÂÂÂÂÂ svm->nested.vmcb = 0;
>> @@ -3061,6 +3087,11 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
>> ÂÂÂÂÂ if (!nested_vmcb)
>> ÂÂÂÂÂÂÂÂÂ return false;
>>
>> +ÂÂÂ /*
>> +ÂÂÂÂ * No need for IBPB here, since the nested VM is less
>> privileged. The
>> +ÂÂÂÂ * L1 hypervisor inserts one already when switching L2 VMs.
>> +ÂÂÂÂ */
>> +
>> ÂÂÂÂÂ if (!nested_vmcb_checks(nested_vmcb)) {
>> ÂÂÂÂÂÂÂÂÂ nested_vmcb->control.exit_codeÂÂÂ = SVM_EXIT_ERR;
>> ÂÂÂÂÂÂÂÂÂ nested_vmcb->control.exit_code_hi = 0;
>>
>