Re: [PATCH RFC 0/7] CPU feature evaluation after microcode late loading
From: Sean Christopherson
Date: Thu Jul 02 2020 - 14:42:20 EST
On Thu, Jul 02, 2020 at 06:18:20PM +0300, Mihai Carabas wrote:
> This RFC patch set aims to provide the ability to re-evaluate all CPU
> features and take proper bug mitigation in place after a microcode
> late loading.
>
> This was debated last year and this patch set implements a subset of
> point #2 from Thomas Gleixner's idea:
> https://lore.kernel.org/lkml/alpine.DEB.2.21.1909062237580.1902@xxxxxxxxxxxxxxxxxxxxxxx/
>
> Point #1 was sent as an RFC some time ago
> (https://lkml.org/lkml/2020/4/27/214), but after a discussion with CPU
> vendors (Intel), the metadata file is not easily buildable at this
> moment so we could not advance with it more. Without #1, I know it is
> unlikely to embrace the feature re-evaluation.
>
> Patches from 1 to 4 bring in changes for functions/variables in order to be
> able to use them at runtime.
>
> Patch 5 re-evaluates CPU features, patch 6 is re-probing bugs and patch 7
> deals with speculation blacklist CPUs/microcode versions.
This misses critical functionality in KVM. KVM snapshots boot_cpu_data at
module load time (and does further modifications) for ongoing reuse in
filtering what features are advertised to the userspace VMM. See
kvm_set_cpu_caps() for details.
Even if you found a way to reference kvm_cpu_caps, that still leaves the
problem of existing guests having been created with stale data. Oh, and
KVM also needs to properly handle MSR_IA32_TSX_CTRL.
Rather than forcefully tearing down guests, what about adding a way to block
updates, e.g. KVM would block updates on module load and unblock on module
exit. That puts the onus of coordinating updates on the orchestration layer
where it belongs.
KVM aside, it wouldn't surprise in the least if there is other code in the
kernel that captures bug state locally. This series feels like it needs a
fair bit of infrastructure to either detect conflicting usage at build time
or actively prevent consuming stale state at runtime.
There's also the problem of the flags being exposed to userspace via
/proc/cpuinfo, though I suppose that's userspace's problem to not shoot
itself in the foot.