Re: [PATCH v2 3/4] x86/bugs: KVM: Add support for SRSO_MSR_FIX

From: Sean Christopherson
Date: Mon Dec 16 2024 - 13:51:28 EST


On Mon, Dec 16, 2024, Borislav Petkov wrote:
> On Wed, Dec 11, 2024 at 02:27:42PM -0800, Sean Christopherson wrote:
> > How much cost are we talking?
>
> Likely 1-2%.
>
> That's why I'm simply enabling it by default.
>
> > IIUC, this magic bit reduces how much the CPU is allowed to speculate in order
> > to mitigate potential VM=>host attacks, and that reducing speculation also reduces
> > overall performance.
> >
> > If that's correct, then enabling the magic bit needs to be gated by an appropriate
> > mitagation being enabled, not forced on automatically just because the CPU supports
> > X86_FEATURE_SRSO_MSR_FIX.
>
> Well, in the default case we have safe-RET - the default - but since it is
> not needed anymore, it falls back to this thing which is needed when the
> mitigation is enabled.
>
> That's why it also is in the SRSO_CMD_IBPB_ON_VMEXIT case as it is part of the
> spec_rstack_overflow=ibpb-vmexit mitigation option.
>
> So it kinda already does that. When you disable the mitigation, this one won't
> get enabled either.

But it's a hardware-defined feature flag, so won't it be set by this code?

if (c->extended_cpuid_level >= 0x8000001f)
c->x86_capability[CPUID_8000_001F_EAX] = cpuid_eax(0x8000001f);

srso_select_mitigation() checks the flag for SRSO_CMD_IBPB_ON_VMEXIT

case SRSO_CMD_IBPB_ON_VMEXIT:
if (boot_cpu_has(X86_FEATURE_SRSO_MSR_FIX)) {
pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
break;
}

but I don't see any code that would clear X86_FEATURE_SRSO_MSR_FIX. Am I missing
something?

> > And depending on the cost, it might also make sense to set the bit on-demand, and
> > then clean up when KVM disables virtualization. E.g. wait to set the bit until
> > entry to a guest is imminent.
>
> So the "when to set that bit" discussion kinda remained unfinished the last
> time.

Gah, sorry. I suspect I got thinking about how best to "set it only when really
needed", and got lost in analysis paralysis.

> Here's gist:
>
> You:
>
> | "It's not strictly KVM module load, it's when KVM enables virtualization. E.g.
> | if userspace clears enable_virt_at_load, the MSR will be toggled every time the
> | number of VMs goes from 0=>1 and 1=>0.
> |
> | But why do this in KVM? E.g. why not set-and-forget in init_amd_zen4()?"
>
> I:
>
> | "Because there's no need to impose an unnecessary - albeit small - perf impact
> | on users who don't do virt.
> |
> | I'm currently gravitating towards the MSR toggling thing, i.e., only when the
> | VMs number goes 0=>1 but I'm not sure. If udev rules *always* load kvm.ko then
> | yes, the toggling thing sounds better. I.e., set it only when really needed."
>
> So to answer your current question, it sounds like the user can control the
> on-demand thing with enable_virt_at_load=0, right?

To some extent. But I strongly suspect that the vast, vast majority of end users
will end up with systems that automatically load kvm.ko, but don't run VMs the
majority of the time. Expecting non-KVM to users to detect a 1-2% regression and
track down enable_virt_at_load doesn't seem like a winning strategy.

> Or do you mean something else different?

The other possibility would be to wait to set the bit until a CPU is actually
going to do VMRUN. If we use KVM's "user-return MSR" framework, the bit would
be cleared when the CPU returns to userspace. The only downside to that is KVM
would toggle the bit on CPUs running vCPUs on every exit to userspace, e.g. to
emulate MMIO/IO and other things.

But, userspace exits are relatively slow paths, so if the below is a wash for
performance when running VMs, i.e. the cost of the WRMSRs is either in the noise
or is offset by the regained 1-2% performance for userspace, then I think it's a
no-brainer.

Enabling "full" speculation on return to usersepace means non-KVM tasks won't be
affected, and there's no "sticky" behavior. E.g. another idea would be to defer
setting the bit until VMRUN is imminent, but then wait to clear the bit until
virtualization is disabled. But that has the downside of the bit being set on all
CPUs over time, especially if enable_virt_at_load is true.

Compile tested only...

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e4fad330cd25..061ac5940432 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -256,6 +256,7 @@ DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
* defer the restoration of TSC_AUX until the CPU returns to userspace.
*/
static int tsc_aux_uret_slot __read_mostly = -1;
+static int zen4_bp_cfg_uret_slot __ro_after_init = -1;

static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};

@@ -608,9 +609,6 @@ static void svm_disable_virtualization_cpu(void)
kvm_cpu_svm_disable();

amd_pmu_disable_virt();
-
- if (cpu_feature_enabled(X86_FEATURE_SRSO_MSR_FIX))
- msr_clear_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
}

static int svm_enable_virtualization_cpu(void)
@@ -688,9 +686,6 @@ static int svm_enable_virtualization_cpu(void)
rdmsr(MSR_TSC_AUX, sev_es_host_save_area(sd)->tsc_aux, msr_hi);
}

- if (cpu_feature_enabled(X86_FEATURE_SRSO_MSR_FIX))
- msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
-
return 0;
}

@@ -1547,6 +1542,11 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
(!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull);

+ if (cpu_feature_enabled(X86_FEATURE_SRSO_MSR_FIX))
+ kvm_set_user_return_msr(zen4_bp_cfg_uret_slot,
+ MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT,
+ MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
+
svm->guest_state_loaded = true;
}

@@ -5313,6 +5313,14 @@ static __init int svm_hardware_setup(void)

tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);

+ if (cpu_feature_enabled(X86_FEATURE_SRSO_MSR_FIX)) {
+ zen4_bp_cfg_uret_slot = kvm_add_user_return_msr(MSR_ZEN4_BP_CFG);
+ if (WARN_ON_ONCE(zen4_bp_cfg_uret_slot) < 0) {
+ r = -EIO;
+ goto err;
+ }
+ }
+
if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
kvm_enable_efer_bits(EFER_AUTOIBRS);