Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU

From: Liran Alon
Date: Mon Jan 08 2018 - 15:00:59 EST




On 08/01/18 20:08, Paolo Bonzini wrote:
From: Tom Lendacky <thomas.lendacky@xxxxxxx>

Set IBPB (Indirect Branch Prediction Barrier) when the current CPU is
going to run a VCPU different from what was previously run. Nested
virtualization uses the same VMCB for the second level guest, but the
L1 hypervisor should be using IBRS to protect itself.

Signed-off-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
arch/x86/kvm/svm.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 779981a00d01..dd744d896cec 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -289,6 +289,7 @@ struct amd_svm_iommu_ir {
module_param(vgif, int, 0444);

static bool __read_mostly have_spec_ctrl;
+static bool __read_mostly have_ibpb_support;

static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
@@ -540,6 +541,7 @@ struct svm_cpu_data {
struct kvm_ldttss_desc *tss_desc;

struct page *save_area;
+ struct vmcb *current_vmcb;
};

static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -1151,6 +1153,11 @@ static __init int svm_hardware_setup(void)
pr_info("kvm: SPEC_CTRL available\n");
else
pr_info("kvm: SPEC_CTRL not available\n");
+ have_ibpb_support = have_spec_ctrl || cpu_has_ibpb_support();
+ if (have_ibpb_support)
+ pr_info("kvm: IBPB_SUPPORT available\n");
+ else
+ pr_info("kvm: IBPB_SUPPORT not available\n");

return 0;

@@ -1725,11 +1732,19 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
+
+ /*
+ * The VMCB could be recycled, causing a false negative in
+ * svm_vcpu_load; block speculative execution.
+ */
+ if (have_ibpb_support)
+ wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
}

static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
int i;

if (unlikely(cpu != vcpu->cpu)) {
@@ -1758,6 +1773,12 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (static_cpu_has(X86_FEATURE_RDTSCP))
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);

+ if (sd->current_vmcb != svm->vmcb) {
+ sd->current_vmcb = svm->vmcb;
+ if (have_ibpb_support)
+ wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
+ }
+
avic_vcpu_load(vcpu, cpu);
}

@@ -2798,6 +2819,11 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
if (!nested_vmcb)
return 1;

+ /*
+ * No need for IBPB here, the L1 hypervisor should be running with
+ * IBRS=1 and inserts one already when switching L2 VMs.
+ */
+

I am not fully convinced yet this is OK from security perspective.
From the CPU point-of-view, both L1 & L2 run in the same prediction-mode (as they are both running in SVM guest-mode). Therefore, IBRS will not hide the BHB & BTB of L1 from L2.

This is how I understand things:
1. When transition between contexts in same prediction-mode, software is responsible for issuing an IBPB to basically "delete" the "branch prediction data" of the previous context such that the new context won't be able to use it.
This is orthogonal to IBRS which makes sure new context won't use "branch prediction data" that was created by a previous less-privileged prediction-mode as we are talking about transitioning between 2 contexts in same physical prediction-mode.
2. Because of (1), KVM was modified to issue IBPB when replacing active VMCB/VMCS on CPU.
3. For the nested-virtualization case, L1 & L2 both run in same physical prediction-mode. As from physical CPU perspective, they are both running in SVM guest-mode. Therefore, L0 should issue IBPB when transitioning from L1 to L2 and vice-versa.
4. In nested VMX case, this already happens because transitioning between L1 & L2 involves changing active VMCS on CPU (from vmcs01 to vmcs02) which already issues an IBPB.
5. However, nested SVM is reusing the same VMCB when transitioning between L1 & L2 and therefore we should explicitly issue an IBPB in nested_svm_vmrun() & nested_svm_vmexit().
6. One can implicitly assume that L1 hypervisor did issued a IBPB when loading an VMCB and therefore it doesn't have to do it again in L0.
However, I see 2 problems with this approach:
(a) We don't protect old non-patched L1 hypervisors from Spectre even though we could easily do that from L0 with small performance hit.
(b) When L1 hypervisor runs only a single L2 VM, it doesn't have to issue an IBPB when loading the VMCB (as it didn't run any other VMCB before) and it should be sufficient from L1 perspective to just write 1 to IBRS. However, in our nested-case, this is a security-hole.

Am I misunderstanding something?

Regards,
-Liran

/* Exit Guest-Mode */
leave_guest_mode(&svm->vcpu);
svm->nested.vmcb = 0;
@@ -3061,6 +3087,11 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
if (!nested_vmcb)
return false;

+ /*
+ * No need for IBPB here, since the nested VM is less privileged. The
+ * L1 hypervisor inserts one already when switching L2 VMs.
+ */
+
if (!nested_vmcb_checks(nested_vmcb)) {
nested_vmcb->control.exit_code = SVM_EXIT_ERR;
nested_vmcb->control.exit_code_hi = 0;