[PATCH v2 0/4] KVM: x86: TIF_NEED_FPU_LOAD bug fixes

From: Sean Christopherson
Date: Fri Jan 17 2020 - 14:34:39 EST


Apologies for the quick v2, I'm hoping the fixes can squeak into 5.5 and
I'm about to go offline for a long weekend.

v2: Add a helper to handle TIF_NEED_FPU_LOAD when saving from the current
FPU to the guest/userspace FPU (patch 01). [Dave]

*** v1 cover letter ***

Three bug fixes related to deferring FPU loading to return to userspace,
or in this case, deferring to entering a KVM guest. And a cleanup patch I
couldn't resist throwing on top.

The crux of the matter is that TIF_FPU_NEED_LOAD can be set any time
control is transferred out of KVM, e.g. via IRQ->softirq, not just when
KVM is preempted. The previous attempt to fix the underlying bug(s)
by handling TIF_FPU_NEED_LOAD during kvm_sched_in() only made the bug(s)
harder to hit, i.e. it resolved the preemption case but only shrunk the
window where a softirq could corrupt state.

Paolo, patch 01 will conflict with commit 95145c25a78c ("KVM: x86: Add a
WARN on TIF_NEED_FPU_LOAD in kvm_load_guest_fpu()") that is sitting in
kvm/queue. The kvm/queue commit should simply be dropped.

Patch 01 fixes the original underlying bug, which is that KVM doesn't
handle TIF_FPU_NEED_LOAD when swapping between guest and userspace FPU
state.

Patch 02 fixes (unconfirmed) issues similar to the reported bug where KVM
doesn't ensure CPU FPU state is fresh when accessing it during emulation.
This patch is smoke tested only (via kvm-unit-tests' emulator tests).

Patch 03 reverts the preemption bug fix, which simultaneously restores the
handling of TIF_FPU_NEED_LOAD in vcpu_enter_guest() to fix the reported
bugs[1][2] and removes the now-unnecessary preemption workaround.

Alternatively, the handling of TIF_NEED_FPU_LOAD in the kvm_sched_in()
path could be left in place, i.e it doesn't cause immediate damage, but
as stated before, restoring FPU state after KVM is preempted only makes
it harder to find the actual bugs. Case in point, I originally split
the revert into two separate patches (so that removing the kvm_sched_in()
call wouldn't be tagged for stable), but that only hid the underlying
kvm_put_guest_fpu() bug until I fully reverted the commit.

Patch 04 removes an unused emulator context param from several of the
functions that are fixed in patch 02. The param was left behind during
a previous KVM FPU state rework.

Tested via Thomas Lambertz's mprime reproducer[3], which detected issues
with buggy KVMs on my system in under a minute. Ran clean for ~30 minutes
on each of the first two patches and several hours with all three patches
applied.

[1] https://lore.kernel.org/kvm/1e525b08-6204-3238-5d56-513f82f1d7fb@xxxxxxx
[2] https://lore.kernel.org/kvm/bug-206215-28872@xxxxxxxxxxxxxxxxxxxxxxxxx%2F
[3] https://lore.kernel.org/lkml/217248af-e980-9cb0-ff0d-9773413b9d38@xxxxxxxxxxxxxxxxx

Sean Christopherson (4):
KVM: x86: Handle TIF_NEED_FPU_LOAD in kvm_{load,put}_guest_fpu()
KVM: x86: Ensure guest's FPU state is loaded when accessing for
emulation
KVM: x86: Revert "KVM: X86: Fix fpu state crash in kvm guest"
KVM: x86: Remove unused ctxt param from emulator's FPU accessors

arch/x86/kvm/emulate.c | 67 ++++++++++++++++++++++++++++++++----------
arch/x86/kvm/x86.c | 28 +++++++++++++-----
2 files changed, 72 insertions(+), 23 deletions(-)

--
2.24.1