Re: [PATCH v3 0/4] KVM: Expose speculation control feature to guests

From: KarimAllah Ahmed
Date: Tue Jan 30 2018 - 04:33:14 EST


On 01/30/2018 10:00 AM, David Woodhouse wrote:


On Tue, 2018-01-30 at 01:10 +0100, KarimAllah Ahmed wrote:
Add direct access to speculation control MSRs for KVM guests. This allows the
guest to protect itself against Spectre V2 using IBRS+IBPB instead of a
retpoline+IBPB based approach.

It also exposes the ARCH_CAPABILITIES MSR which is going to be used by future
Intel processors to indicate RDCL_NO and IBRS_ALL.

Thanks. I think you've already fixed the SPEC_CTRL patch in the git
tree so that it adds F(IBRS) to kvm_cpuid_8000_0008_ebx_x86_features,
right?
Yup, this is already fixed in the tree.


The SVM part of Ashok's IBPB patch is still exposing the PRED_CMD MSR
to guests based on boot_cpu_has(IBPB), not based on the *guest*
capabilities. Looking back at Paolo's patch set from January 9th, it
was done differently there but I think it had the same behaviour?

The rest of Paolo's patch set I think has been covered, except 6/8:
Âlkml.kernel.org/r/20180109120311.27565-7-pbonzini@xxxxxxxxxx

That exposes SPEC_CTRL for SVM too (since AMD now apparently has it).
If adding that ends up with duplicate MSR handling for get/set, perhaps
that wants shifting up into kvm_[sg]et_msr_common()? Although I don't
see offhand where you'd put the ->spec_ctrl field in that case. It
doesn't want to live in the generic (even to non-x86) struct kvm_vcpu.
So maybe a little bit of duplication is the best answer.

Other than those details, I think we're mostly getting close. Do we
want to add STIBP on top? There is some complexity there which meant I
was happier getting these first bits ready first, before piling that on
too.

I believe Ashok sent you a change which made us do IBPB on *every*
vmexit; I don't think we need that. It's currently done in vcpu_load()
which means we'll definitely have done it between running one vCPU and
the next, and when vCPUs are pinned we basically never need to do it.

We know that VMM (e.g. qemu) userspace could be vulnerable to attacks
from guest ring 3, because there is no flush between the vmexit and the
host kernel "returning" to the userspace thread. Doing a full IBPB on
*every* vmexit would protect from that, but it's overkill. If that's
the reason, let's come up with something better.

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B