rcu_preempt self-detected stall on CPU from 4.4-rc4, since 3.17
From: Ross Green
Date: Fri Dec 11 2015 - 06:17:41 EST
I have been getting these stalls in kernels going back to 3.17.
This stall occurs usually under light load but often requires several
days to show itself. I have not found any simple way to trigger the
stall. Indeed heavy workloads seems not to show the fault.
Anyone have any thoughts here?
The recent patch by peterz with kernel/sched/wait.c I thought might
help the situation, but alas after a few days of running 4.4-rc4 the
following turned up.
[179922.003570] INFO: rcu_preempt self-detected stall on CPU
[179922.008178] INFO: rcu_preempt detected stalls on CPUs/tasks:
[179922.008178] 0-...: (1 ticks this GP) idle=a91/1/0
softirq=1296733/1296733 fqs=0
[179922.008178]
[179922.008209] (detected by 1, t=8775 jiffies, g=576439, c=576438, q=102)
[179922.008209] Task dump for CPU 0:
[179922.008209] swapper/0 R
[179922.008209] running
[179922.008209] 0 0 0 0x00000000
[179922.008209] Backtrace:
[179922.008239] Backtrace aborted due to bad frame pointer <c0907f54>
[179922.008239] rcu_preempt kthread starved for 8775 jiffies! g576439
c576438 f0x0 s3 ->state=0x1
[179922.060302] 0-...: (1 ticks this GP) idle=a91/1/0
softirq=1296733/1296733 fqs=0
[179922.068023] (t=8775 jiffies g=576439 c=576438 q=102)
[179922.073913] rcu_preempt kthread starved for 8775 jiffies! g576439
c576438 f0x2 s3 ->state=0x100
[179922.083587] Task dump for CPU 0:
[179922.087097] swapper/0 R running 0 0 0 0x00000000
[179922.093292] Backtrace:
[179922.096313] [<c0013ea8>] (dump_backtrace) from [<c00140a4>]
(show_stack+0x18/0x1c)
[179922.104675] r7:c0908514 r6:80080193 r5:00000000 r4:c090aca8
[179922.110809] [<c001408c>] (show_stack) from [<c005a858>]
(sched_show_task+0xbc/0x110)
[179922.119049] [<c005a79c>] (sched_show_task) from [<c005ccd4>]
(dump_cpu_task+0x40/0x44)
[179922.127624] r5:c0917680 r4:00000000
[179922.131042] [<c005cc94>] (dump_cpu_task) from [<c0082268>]
(rcu_dump_cpu_stacks+0x9c/0xdc)
[179922.140350] r5:c0917680 r4:00000001
[179922.143157] [<c00821cc>] (rcu_dump_cpu_stacks) from [<c008637c>]
(rcu_check_callbacks+0x504/0x8e4)
[179922.153808] r9:c0908514 r8:c0917680 r7:00000066 r6:2eeab000
r5:c0904300 r4:ef7af300
[179922.161499] [<c0085e78>] (rcu_check_callbacks) from [<c00895d0>]
(update_process_times+0x40/0x6c)
[179922.170898] r10:c009a584 r9:00000001 r8:ef7abc4c r7:0000a3a3
r6:4ec3391c r5:00000000
[179922.179901] r4:c090aca8
[179922.182708] [<c0089590>] (update_process_times) from [<c009a580>]
(tick_sched_handle+0x50/0x54)
[179922.192108] r5:c0907f10 r4:ef7abe40
[179922.195983] [<c009a530>] (tick_sched_handle) from [<c009a5d4>]
(tick_sched_timer+0x50/0x94)
[179922.204895] [<c009a584>] (tick_sched_timer) from [<c0089fe4>]
(__hrtimer_run_queues+0x110/0x1a0)
[179922.214324] r7:00000000 r6:ef7abc40 r5:ef7abe40 r4:ef7abc00
[179922.220428] [<c0089ed4>] (__hrtimer_run_queues) from [<c008a674>]
(hrtimer_interrupt+0xac/0x1f8)
[179922.227111] r10:ef7abc78 r9:ef7abc98 r8:ef7abc14 r7:ef7abcb8
r6:ffffffff r5:00000003
[179922.238220] r4:ef7abc00
[179922.238220] [<c008a5c8>] (hrtimer_interrupt) from [<c00170ec>]
(twd_handler+0x38/0x48)
[179922.238220] r10:c09084e8 r9:fa241100 r8:00000011 r7:ef028780
r6:c092574c r5:ef005cc0
[179922.257110] r4:00000001
[179922.257110] [<c00170b4>] (twd_handler) from [<c007c8f8>]
(handle_percpu_devid_irq+0x74/0x8c)
[179922.269683] r5:ef005cc0 r4:ef7b1740
[179922.269683] [<c007c884>] (handle_percpu_devid_irq) from
[<c0078454>] (generic_handle_irq+0x2c/0x3c)
[179922.283233] r9:fa241100 r8:ef008000 r7:00000001 r6:00000000
r5:00000000 r4:c09013e8
[179922.290985] [<c0078428>] (generic_handle_irq) from [<c007872c>]
(__handle_domain_irq+0x64/0xbc)
[179922.300842] [<c00786c8>] (__handle_domain_irq) from [<c00094c0>]
(gic_handle_irq+0x50/0x90)
[179922.303222] r9:fa241100 r8:fa240100 r7:c0907f10 r6:fa24010c
r5:c09087a8 r4:c0925748
[179922.315216] [<c0009470>] (gic_handle_irq) from [<c0014bd4>]
(__irq_svc+0x54/0x90)
[179922.319000] Exception stack(0xc0907f10 to 0xc0907f58)
[179922.331542] 7f00: 00000000
ef7ab390 fe600000 00000000
[179922.331542] 7f20: c0906000 c090849c c0900364 c06a8124 c0907f80
c0944563 c09084e8 c0907f6c
[179922.349029] 7f40: c0907f4c c0907f60 c00287ac c0010ba8 60080113 ffffffff
[179922.349029] r9:c0944563 r8:c0907f80 r7:c0907f44 r6:ffffffff
r5:60080113 r4:c0010ba8
[179922.357116] [<c0010b80>] (arch_cpu_idle) from [<c006f034>]
(default_idle_call+0x28/0x34)
[179922.368926] [<c006f00c>] (default_idle_call) from [<c006f154>]
(cpu_startup_entry+0x114/0x18c)
[179922.368926] [<c006f040>] (cpu_startup_entry) from [<c069fc6c>]
(rest_init+0x90/0x94)
[179922.385284] r7:ffffffff r4:00000002
[179922.393463] [<c069fbdc>] (rest_init) from [<c08bbcec>]
(start_kernel+0x370/0x37c)
[179922.400421] r5:c0947000 r4:00000000
[179922.400421] [<c08bb97c>] (start_kernel) from [<8000807c>] (0x8000807c)
$
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/