Re: [PATCH RESEND] KVM: X86: Fix load bad host fpu state

From: Peter Xu
Date: Mon Dec 11 2017 - 22:37:05 EST


On Tue, Dec 12, 2017 at 05:51:26AM +0800, Wanpeng Li wrote:
> 2017-12-12 4:48 GMT+08:00 David Hildenbrand <david@xxxxxxxxxx>:
> > On 10.12.2017 22:44, Wanpeng Li wrote:
> >> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
> >>
> >> ------------[ cut here ]------------
> >> Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers.
> >> WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90
> >> CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G B OE 4.15.0-rc2+ #10
> >> RIP: 0010:ex_handler_fprestore+0x88/0x90
> >> Call Trace:
> >> fixup_exception+0x4e/0x60
> >> do_general_protection+0xff/0x270
> >> general_protection+0x22/0x30
> >> RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm]
> >> RSP: 0018:ffff8803d5627810 EFLAGS: 00010246
> >> kvm_vcpu_reset+0x3b4/0x3c0 [kvm]
> >> kvm_apic_accept_events+0x1c0/0x240 [kvm]
> >> kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm]
> >> kvm_vcpu_ioctl+0x479/0x880 [kvm]
> >> do_vfs_ioctl+0x142/0x9a0
> >> SyS_ioctl+0x74/0x80
> >> do_syscall_64+0x15f/0x600
> >>
> >> This can be reproduced by running any testcase in kvm-unit-tests since
> >> the qemu userspace FPU context is not initialized, which results in the
> >> init path from kvm_apic_accept_events() will load/put qemu userspace
> >> FPU context w/o initialized. In addition, w/o this splatting we still
> >> should initialize vcpu->arch.user_fpu instead of current->thread.fpu.
> >> This patch fixes it by initializing qemu user FPU context if it is
> >> uninitialized before KVM_RUN.
> >>
> >> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> >> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
> >> Cc: Rik van Riel <riel@xxxxxxxxxx>
> >> Cc: stable@xxxxxxxxxxxxxxx
> >> Fixes: f775b13eedee (x86,kvm: move qemu/guest FPU switching out to vcpu_run)
> >> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
> >> ---
> >> arch/x86/kvm/x86.c | 7 +++++--
> >> 1 file changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index a92b22f..063a643 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -7273,10 +7273,13 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
> >>
> >> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> >> {
> >> - struct fpu *fpu = &current->thread.fpu;
> >> + struct fpu *fpu = &vcpu->arch.user_fpu;
> >> int r;
> >>
> >> - fpu__initialize(fpu);
> >> + if (!fpu->initialized) {
> >> + fpstate_init(&fpu->state);
> >> + fpu->initialized = 1;
> >> + }
> >
> > Is there a chance of keeping using fpu__initialize() ? Duplicating the
> > code is ugly.
>
> There is a warning in fpu__initialize() which results in just
> current->thread.fpu can take advantage of.
>
> >
> > E.g. can't we simply initialize that in kvm_load_guest_fpu?
>
> We still miss to initialize qemu user FPU context for the above calltrace.

IMHO we should not really init the user FPU since we should always
load FPU then put FPU. The problem now is that for vcpus that with
vcpu_id>1 we'll first put the FPU before loading it. So, instead how
about we make sure we load the FPU first even for non-bootstrap vcpus?
And we can actually drop fpu__initialize() call, like:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 064eba25c215..e6aed00ba968 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7263,13 +7263,12 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)

int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
- struct fpu *fpu = &current->thread.fpu;
int r;

- fpu__initialize(fpu);
-
kvm_sigset_activate(vcpu);

+ kvm_load_guest_fpu(vcpu);
+
if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
if (kvm_run->immediate_exit) {
r = -EINTR;
@@ -7295,14 +7294,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
}

- kvm_load_guest_fpu(vcpu);
-
if (unlikely(vcpu->arch.complete_userspace_io)) {
int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
vcpu->arch.complete_userspace_io = NULL;
r = cui(vcpu);
if (r <= 0)
- goto out_fpu;
+ goto out;
} else
WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);

@@ -7311,9 +7308,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
else
r = vcpu_run(vcpu);

-out_fpu:
- kvm_put_guest_fpu(vcpu);
out:
+ kvm_put_guest_fpu(vcpu);
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);

Thanks,

--
Peter Xu