RE: [Ask for help] met a deadlock with switch_fpu_finish on suse 3.0.93-0.8-default kernel

From: Liuyongan
Date: Wed Mar 16 2016 - 02:33:58 EST


> -----Original Message-----
> From: Wangweidong (Dan)
> Sent: Tuesday, March 15, 2016 9:25 PM
> To: tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; hpa@xxxxxxxxx; x86@xxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; torvalds@xxxxxxxxxxxxxxxxxxxx
> Cc: Fengtiantian; Liuyongan; Wangweidong (Dan)
> Subject: [Ask for help] met a deadlock with switch_fpu_finish on suse
> 3.0.93-0.8-default kernel
>
> Hi all,
>
> We find a deadlock problem in suse 3.0.93-0.8-default kernel when
> restore_fpu_checking return error in task switch.
> --------------------------------------------
> The Call Trace is :
> 193 PID: 2415 TASK: ffff880b739d24c0 CPU: 5 COMMAND: "qemu-kvm"
> 194 #0 [ffff880c7f6a6e40] crash_nmi_callback at ffffffff8102460f
> 195 #1 [ffff880c7f6a6e50] notifier_call_chain at ffffffff81465027
> 196 #2 [ffff880c7f6a6e80] __atomic_notifier_call_chain at ffffffff8146506d
> 197 #3 [ffff880c7f6a6e90] notify_die at ffffffff814650bd
> 198 #4 [ffff880c7f6a6ec0] default_do_nmi at ffffffff81462507
> 199 #5 [ffff880c7f6a6ee0] do_nmi at ffffffff81462738
> 200 #6 [ffff880c7f6a6ef0] restart_nmi at ffffffff81461c91
> 201 [exception RIP: _raw_spin_lock+21]
> 202 RIP: ffffffff814611e5 RSP: ffff8809d8d1ba80 RFLAGS: 00000093
> 203 RAX: 0000000000000010 RBX: 0000000000000010 RCX:
> 0000000000000093
> 204 RDX: ffff8809d8d1ba80 RSI: 0000000000000018 RDI:
> 0000000000000001
> 205 RBP: ffffffff814611e5 R8: ffffffff814611e5 R9:
> 0000000000000018
> 206 R10: ffff8809d8d1ba80 R11: 0000000000000093 R12:
> ffffffffffffffff
> 207 R13: ffff880c7f6b0a00 R14: 0000000000000005 R15:
> 000000000000e2b8
> 208 ORIG_RAX: 000000000000e2b8 CS: 0010 SS: 0018
> 209 --- <DOUBLEFAULT exception stack> ---
> 210 #7 [ffff8809d8d1ba80] _raw_spin_lock at ffffffff814611e5
> 211 #8 [ffff8809d8d1ba80] try_to_wake_up at ffffffff81054afb
> 212 #9 [ffff8809d8d1bad0] pollwake at ffffffff8116cfc6
> 213 #10 [ffff8809d8d1bb10] __wake_up_common at ffffffff81046e1a
> 214 #11 [ffff8809d8d1bb50] __wake_up at ffffffff8104bf43
> 215 #12 [ffff8809d8d1bb90] __send_signal at ffffffff81074bfd
> 216 #13 [ffff8809d8d1bbd0] force_sig_info at ffffffff81076194
> 217 #14 [ffff8809d8d1bc00] __switch_to at ffffffff81001930
> 218 #15 [ffff8809d8d1bcf0] reschedule_interrupt at ffffffff8146a06e
> 219 #16 [ffff8809d8d1bd58] vmx_handle_external_intr at ffffffffa03c3f4c
> [kvm_intel]
> 220 #17 [ffff8809d8d1bd80] vcpu_enter_guest at ffffffffa0363487 [kvm]
> 221 #18 [ffff8809d8d1be00] __vcpu_run at ffffffffa0363743 [kvm]
> 222 #19 [ffff8809d8d1be40] kvm_arch_vcpu_ioctl_run at ffffffffa0364438
> [kvm]
> 223 #20 [ffff8809d8d1be70] kvm_vcpu_ioctl at ffffffffa0350cee [kvm]
> 224 #21 [ffff8809d8d1bf10] do_vfs_ioctl at ffffffff8116bd1b
> 225 #22 [ffff8809d8d1bf40] sys_ioctl at ffffffff8116c0e1
> 226 #23 [ffff8809d8d1bf80] system_call_fastpath at ffffffff81469172
> --------------------------------------------
>
> We see the patch
> commit 80ab6f1e8c981b1b6604b2f22e36c917526235cd
> "i387: use 'restore_fpu_checking()' directly in task switching code"
>
> this patch remove the __math_state_restore in switch_fpu_finish,like that:
>
> static inline void switch_fpu_finish(struct task_struct *new, fpu_switch_t fpu)
> {
> - if (fpu.preload)
> - __math_state_restore(new);
> + if (fpu.preload) {
> + if (unlikely(restore_fpu_checking(new)))
> + __thread_fpu_end(new);
> + }
> }
>
> So in switch_fpu_finish, when entered restore_fpu_checking fail, it won't call
> force_sig().
>
>
> 1. Would it will fix this issuse(deadlock)?
> 2. We don't understand why the restore_fpu_checking would failed? Any one
> know that?

Here is a patch that might cause fpu error. Anybody know anything else?

commit 42bdf991f4cad9678ee2b98c5c2e9299a3f986ef
Author: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
Date: Mon Apr 15 23:30:13 2013 -0300
KVM: x86: fix maintenance of guest/host xcr0 state
Emulation of xcr0 writes zero guest_xcr0_loaded variable so that
subsequent VM-entry reloads CPU's xcr0 with guests xcr0 value.
However, this is incorrect because guest_xcr0_loaded variable is
read to decide whether to reload hosts xcr0.
In case the vcpu thread is scheduled out after the guest_xcr0_loaded = 0
assignment, and scheduler decides to preload FPU:
switch_to
{
__switch_to
__math_state_restore
restore_fpu_checking
fpu_restore_checking
if (use_xsave())
fpu_xrstor_checking
xrstor64 with CPU's xcr0 == guests xcr0
Fix by properly restoring hosts xcr0 during emulation of xcr0 writes.


> 3. if the patch can fix the problem, We want to know that
> "restore_fpu_checking(tsk) really fail,and we not force send the SIGSEGV to
> the task,
> Would it introuduce other issue?"
>
> Regards,
> Weidong
>
>