Re: frequent lockups in 3.18rc4

From: Dave Jones
Date: Wed Nov 19 2014 - 09:59:53 EST


On Tue, Nov 18, 2014 at 08:40:55PM -0800, Linus Torvalds wrote:
> On Tue, Nov 18, 2014 at 6:19 PM, Dave Jones <davej@xxxxxxxxxx> wrote:
> >
> > NMI watchdog: BUG: soft lockup - CPU#2 stuck for 21s! [trinity-c42:31480]
> > CPU: 2 PID: 31480 Comm: trinity-c42 Not tainted 3.18.0-rc5+ #91 [loadavg: 174.61 150.35 148.64 9/411 32140]
> > RIP: 0010:[<ffffffff8a1798b4>] [<ffffffff8a1798b4>] context_tracking_user_enter+0xa4/0x190
> > Call Trace:
> > [<ffffffff8a012fc5>] syscall_trace_leave+0xa5/0x160
> > [<ffffffff8a7d8624>] int_check_syscall_exit_work+0x34/0x3d
>
> Hmm, if we are getting soft-lockups here, maybe it suggest too much exit-work.
>
> Some TIF_NOHZ loop, perhaps? You have nohz on, don't you?
>
> That makes me wonder: does the problem go away if you disable NOHZ?

Aparently not.

NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c75:25175]
CPU: 3 PID: 25175 Comm: trinity-c75 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
task: ffff8800364e44d0 ti: ffff880192d2c000 task.ti: ffff880192d2c000
RIP: 0010:[<ffffffff94175be7>] [<ffffffff94175be7>] context_tracking_user_exit+0x57/0x120
RSP: 0018:ffff880192d2fee8 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000100000046 RCX: 000000336ee35b47
RDX: 0000000000000001 RSI: ffffffff94ac1e84 RDI: ffffffff94a93725
RBP: ffff880192d2fef8 R08: 00007f9b74d0b740 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff940d8503
R13: ffff880192d2fe98 R14: ffffffff943884e7 R15: ffff880192d2fe48
FS: 00007f9b74d0b740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000336f1b7740 CR3: 0000000229a95000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
ffff880192d30000 0000000000080000 ffff880192d2ff78 ffffffff94012c25
00007f9b747a5000 00007f9b747a5068 0000000000000000 0000000000000000
0000000000000000 ffffffff9437b3be 0000000000000000 0000000000000000
Call Trace:
[<ffffffff94012c25>] syscall_trace_enter_phase1+0x125/0x1a0
[<ffffffff9437b3be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff947d41bf>] tracesys+0x14/0x4a
Code: 42 fd ff 48 c7 c7 7a 1e ac 94 e8 25 29 21 00 65 8b 04 25 34 f7 1c 00 83 f8 01 74 28 f6 c7 02 74 13 0f 1f 00 e8 bb 43 fd ff 53 9d <5b> 41 5c 5d c3 0f 1f 40 00 53 9d e8 89 42 fd ff eb ee 0f 1f 80
sending NMI to other CPUs:
NMI backtrace for cpu 1
CPU: 1 PID: 25164 Comm: trinity-c64 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 9/410 27945]
task: ffff88011600dbc0 ti: ffff8801a99a4000 task.ti: ffff8801a99a4000
RIP: 0010:[<ffffffff940fb71e>] [<ffffffff940fb71e>] generic_exec_single+0xee/0x1a0
RSP: 0018:ffff8801a99a7d18 EFLAGS: 00000202
RAX: 0000000000000000 RBX: ffff8801a99a7d20 RCX: 0000000000000038
RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000000000000000
RBP: ffff8801a99a7d78 R08: ffff880242b57ce0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
R13: 0000000000000001 R14: ffff880083c28948 R15: ffffffff94166aa0
FS: 00007f9b74d0b740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000001 CR3: 00000001d8611000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
ffff8801a99a7d28 0000000000000000 ffffffff94166aa0 ffff880083c28948
0000000000000003 00000000e38f9aac ffff880083c28948 00000000ffffffff
0000000000000003 ffffffff94166aa0 ffff880083c28948 0000000000000001
Call Trace:
[<ffffffff94166aa0>] ? perf_swevent_add+0x120/0x120
[<ffffffff94166aa0>] ? perf_swevent_add+0x120/0x120
[<ffffffff940fb89a>] smp_call_function_single+0x6a/0xe0
[<ffffffff940a172b>] ? preempt_count_sub+0x7b/0x100
[<ffffffff941671aa>] perf_event_read+0xca/0xd0
[<ffffffff94167240>] perf_event_read_value+0x90/0xe0
[<ffffffff941689c6>] perf_read+0x226/0x370
[<ffffffff942fbfb7>] ? security_file_permission+0x87/0xa0
[<ffffffff941eafff>] vfs_read+0x9f/0x180
[<ffffffff941ebbd8>] SyS_read+0x58/0xd0
[<ffffffff947d42c9>] tracesys_phase2+0xd4/0xd9
Code: 48 89 de 48 03 14 c5 20 65 d1 94 48 89 df e8 8a 4b 28 00 84 c0 75 46 45 85 ed 74 11 f6 43 18 01 74 0b 0f 1f 00 f3 90 f6 43 18 01 <75> f8 31 c0 48 8b 4d c8 65 48 33 0c 25 28 00 00 00 0f 85 8e 00
NMI backtrace for cpu 0
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 35.055 msecs
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 11/410 27945]
task: ffffffff94c164c0 ti: ffffffff94c00000 task.ti: ffffffff94c00000
RIP: 0010:[<ffffffff943dd415>] [<ffffffff943dd415>] intel_idle+0xd5/0x180
RSP: 0018:ffffffff94c03e28 EFLAGS: 00000046
RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffff94c03fd8 RDI: 0000000000000000
RBP: ffffffff94c03e58 R08: 000000008baf8b86 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
R13: 0000000000000032 R14: 0000000000000004 R15: ffffffff94c00000
FS: 0000000000000000(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f162e060000 CR3: 0000000014c11000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
0000000094c03e58 5955c5b31ad5e8cf ffffe8fffee031a8 0000000000000005
ffffffff94ca9dc0 0000000000000000 ffffffff94c03ea8 ffffffff94661f05
00001cb7dcf6fd93 ffffffff94ca9f90 ffffffff94c00000 ffffffff94d18870
Call[31557.908912] NMI backtrace for cpu 2
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 68.178 msecs
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.18.0-rc5+ #92 [loadavg: 168.72 151.72 150.38 10/410 27945]
task: ffff880242b596f0 ti: ffff880242b6c000 task.ti: ffff880242b6c000
RIP: 0010:[<ffffffff943dd415>] [<ffffffff943dd415>] intel_idle+0xd5/0x180
RSP: 0018:ffff880242b6fdf8 EFLAGS: 00000046
RAX: 0000000000000032 RBX: 0000000000000010 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff880242b6ffd8 RDI: 0000000000000002
RBP: ffff880242b6fe28 R08: 000000008baf8b86 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
R13: 0000000000000032 R14: 0000000000000004 R15: ffff880242b6c000
FS: 0000000000000000(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000014c11000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
0000000242b6fe28 da97aa9b9f42090a ffffe8ffff2031a8 0000000000000005
ffffffff94ca9dc0 0000000000000002 ffff880242b6fe78 ffffffff94661f05
00001cb7dcdd1af6 ffffffff94ca9f90 ffff880242b6c000 ffffffff94d18870
Call Trace:
[<ffffffff94661f05>] cpuidle_enter_state+0x55/0x1c0
[<ffffffff94662127>] cpuidle_enter+0x17/0x20
[<ffffffff940b94a3>] cpu_startup_entry+0x423/0x4d0
[<ffffffff9402b763>] start_secondary+0x1a3/0x220
Code: 31 d2 65 48 8b 34 25 08 ba 00 00 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <65> 48 8b 0c 25 08 ba 00 00 f0 80 a1 3a c0 ff ff df 0f ae f0 48
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 95.994 msecs


<tangent>

Also, today I learned we can reach the perf_event_read code from
read(). Given I had /proc/sys/kernel/perf_event_paranoid set to 1,
I'm not sure how this is even possible. The only user of perf_fops
is perf_event_open syscall _after_ it's checked that sysctl.

Oh, there's an ioctl path to perf too. Though trinity
doesn't know anything about it, so I find it surprising if it
managed to pull the right combination of entropy to make that
do the right thing. Still, that ioctl path probably needs
to also be checking that sysctl shouldn't it ?

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/