Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6
From: Paul E. McKenney
Date: Tue Jul 24 2012 - 14:44:46 EST
On Tue, Jul 24, 2012 at 07:10:49PM +0200, Sasha Levin wrote:
> Hi all,
>
> I was fuzzing with trinity inside a KVM tools guest, on the current 3.6, and stumbled on the following:
>
> (Note that it also happens on -next).
>
> [ 215.034674] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 215.035641] 1: (0 ticks this GP) idle=3f5/140000000000001/0 drain=0 . timer=18446744073709551615
OK, so this says that the CPU is awake, so let's look at CPU 1's stack.
It also says that the CPU has not taken any scheduling-globk interrupts
for the past 100K+ jiffies, indicating that the CPU has interrupts
disabled.
> [ 215.035641] (detected by 3, t=111886 jiffies)
> [ 215.035641] sending NMI to all CPUs:
> [ 214.959504] NMI backtrace for cpu 2
> [ 214.959504] CPU 2
> [ 214.959504] Pid: 6583, comm: trinity-child6 Tainted: G W 3.5.0-sasha-01644-g824681b #267
> [ 214.959504] RIP: 0010:[<ffffffff81149aa3>] [<ffffffff81149aa3>] check_flags+0x93/0x1d0
> [ 214.959504] RSP: 0018:ffff880025175b58 EFLAGS: 00000046
> [ 214.959504] RAX: ffff880027910000 RBX: ffff880027910000 RCX: 0000000000000000
> [ 214.959504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
> [ 214.959504] RBP: ffff880025175b58 R08: 0000000000000000 R09: 0000000000000000
> [ 214.959504] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> [ 214.959504] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
> [ 214.959504] FS: 00007f754d198700(0000) GS:ffff880029800000(0000) knlGS:0000000000000000
> [ 214.959504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 214.959504] CR2: 0000000000000008 CR3: 00000000251aa000 CR4: 00000000000406e0
> [ 214.959504] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 214.959504] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 214.959504] Process trinity-child6 (pid: 6583, threadinfo ffff880025174000, task ffff880027910000)
> [ 214.959504] Stack:
> [ 214.959504] ffff880025175be8 ffffffff8114e610 ffff880027910000 0000000000000000
> [ 214.959504] 0000000000000286 ffffffffffffff10 ffffffff8114e51d 0000000000000010
> [ 214.959504] 0000000000000286 0000000025175bb8 ffff880019a767c8 0000000000000286
> [ 214.959504] Call Trace:
> [ 214.959504] [<ffffffff8114e610>] lock_acquire+0x80/0x270
> [ 214.959504] [<ffffffff8114e51d>] ? lock_release+0x1ad/0x220
> [ 214.959504] [<ffffffff83699dcb>] _raw_spin_lock+0x3b/0x70
> [ 214.959504] [<ffffffff8124a055>] ? __d_lookup+0x145/0x2e0
> [ 214.959504] [<ffffffff8124a055>] __d_lookup+0x145/0x2e0
> [ 214.959504] [<ffffffff8123d216>] lookup_fast+0x136/0x240
> [ 214.959504] [<ffffffff8123be8b>] ? __inode_permission+0xbb/0xd0
> [ 214.959504] [<ffffffff8123bea9>] ? inode_permission+0x9/0x50
> [ 214.959504] [<ffffffff8123da1c>] link_path_walk+0x22c/0x8f0
> [ 214.959504] [<ffffffff8123cc43>] ? path_init+0x213/0x550
> [ 214.959504] [<ffffffff81976e38>] ? __raw_spin_lock_init+0x38/0x70
> [ 214.959504] [<ffffffff8123faa8>] path_openat+0x98/0x400
> [ 214.959504] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13
> [ 214.959504] [<ffffffff8123fe4d>] do_filp_open+0x3d/0xa0
> [ 214.959504] [<ffffffff819770c8>] ? do_raw_spin_unlock+0xc8/0xe0
> [ 214.959504] [<ffffffff8124f703>] ? alloc_fd+0x1d3/0x1f0
> [ 214.959504] [<ffffffff8124f531>] ? alloc_fd+0x1/0x1f0
> [ 214.959504] [<ffffffff8122f125>] do_sys_open+0x115/0x1b0
> [ 214.959504] [<ffffffff8122f1dc>] sys_open+0x1c/0x20
> [ 214.959504] [<ffffffff8369b77d>] system_call_fastpath+0x1a/0x1f
> [ 214.959504] Code: a4 26 84 e8 70 e2 f8 ff e9 25 01 00 00 0f 1f 00 44 8b 05 61 1c 1e 04 45 85 c0 75 4c 65 48 8b 04 25 00 da 00 00 8b b8 b0 08 00 00 <85> ff 75 39 e8 64 b4 81 00 85 c0 0f 84 09 01 00 00 83 3d 45 bf
And this is CPU 1:
> [ 215.026612] NMI backtrace for cpu 1
> [ 215.026612] CPU 1
> [ 215.026612] Pid: 2395, comm: pageattr-test Tainted: G W 3.5.0-sasha-01644-g824681b #267
> [ 215.026612] RIP: 0010:[<ffffffff8109bd5a>] [<ffffffff8109bd5a>] native_write_msr_safe+0xa/0x10
> [ 215.026612] RSP: 0018:ffff88001b803f70 EFLAGS: 00000046
The interrupt flag is zero, so interrupts are disabled. So my question
to you is "Why did do_pageattr_test() or one of the functions it called
disable interrupts for more than one hundred thousand jiffies?"
I can't see where it is disabling interrupts, but the EFLAGS register
above does indicate that they are disabled. Or is this something useless
that is just telling us that receiving the NMI disabled interrupts?
Thanx, Paul
> [ 215.026612] RAX: 0000000000000000 RBX: 000000000000ef80 RCX: 000000000000080b
> [ 215.026612] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000080b
> [ 215.026612] RBP: ffff88001b803f70 R08: 00003ffffffff000 R09: ffff880000000000
> [ 215.026612] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [ 215.026612] R13: ffff880007a98000 R14: ffff88001fc3a4b8 R15: ffff880000000000
> [ 215.026612] FS: 0000000000000000(0000) GS:ffff88001b800000(0000) knlGS:0000000000000000
> [ 215.026612] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 215.026612] CR2: 000000000000000c CR3: 0000000019dcf000 CR4: 00000000000406e0
> [ 215.026612] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 215.026612] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 215.026612] Process pageattr-test (pid: 2395, threadinfo ffff88000ced0000, task ffff88000d773000)
> [ 215.026612] Stack:
> [ 215.026612] ffff88001b803f88 ffffffff81095d07 000000000000ef80 ffff88001b803fa8
> [ 215.026612] ffffffff810901c4 ffff88000ced1db0 0000000000007a98 ffff88000ced1d00
> [ 215.026612] ffffffff8369c4af ffff88000ced1c78 <EOI> ffff88000ced1d00 ffffffff8369aeb4
> [ 215.026612] Call Trace:
> [ 215.026612] <IRQ>
> [ 215.026612] [<ffffffff81095d07>] native_apic_msr_eoi_write+0x27/0x30
> [ 215.026612] [<ffffffff810901c4>] smp_apic_timer_interrupt+0x34/0xa0
> [ 215.026612] [<ffffffff8369c4af>] apic_timer_interrupt+0x6f/0x80
> [ 215.026612] <EOI>
> [ 215.026612] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13
> [ 215.026612] [<ffffffff810a2cb2>] ? lookup_address+0x112/0x170
> [ 215.026612] [<ffffffff810a32ae>] print_split+0x7e/0x200
> [ 215.026612] [<ffffffff810a4300>] pageattr_test+0x80/0x4a0
> [ 215.026612] [<ffffffff8114a2f8>] ? trace_hardirqs_on_caller+0x118/0x140
> [ 215.026612] [<ffffffff83696cbe>] ? schedule_timeout+0x2ce/0x370
> [ 215.026612] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13
> [ 215.026612] [<ffffffff810a4720>] ? pageattr_test+0x4a0/0x4a0
> [ 215.026612] [<ffffffff810a473f>] do_pageattr_test+0x1f/0x50
> [ 215.026612] [<ffffffff8110607d>] kthread+0xad/0xc0
> [ 215.026612] [<ffffffff8369cdb4>] kernel_thread_helper+0x4/0x10
> [ 215.026612] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13
> [ 215.026612] [<ffffffff81105fd0>] ? kthread_flush_work_fn+0x10/0x10
> [ 215.026612] [<ffffffff8369cdb0>] ? gs_change+0x13/0x13
> [ 215.026612] Code: 00 55 89 f9 48 89 e5 0f 32 45 31 c0 89 c7 48 89 d0 44 89 06 48 c1 e0 20 89 f9 5d 48 09 c8 c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 5d 48
> [ 215.035641] NMI backtrace for cpu 3
> [ 215.035641] CPU 3
> [ 215.035641] Pid: 0, comm: swapper/3 Tainted: G W 3.5.0-sasha-01644-g824681b #267
> [ 215.035641] RIP: 0010:[<ffffffff8109bd5a>] [<ffffffff8109bd5a>] native_write_msr_safe+0xa/0x10
> [ 215.035641] RSP: 0018:ffff88000d5b5ce8 EFLAGS: 00000086
> [ 215.035641] RAX: 0000000000000400 RBX: 0000000000000003 RCX: 0000000000000830
> [ 215.035641] RDX: 0000000000000003 RSI: 0000000000000400 RDI: 0000000000000830
> [ 215.035641] RBP: ffff88000d5b5ce8 R08: ffffffff84edc9c0 R09: 0000000000000000
> [ 215.035641] R10: 000000000000739d R11: 0000000000025370 R12: ffffffff84edc9c0
> [ 215.035641] R13: 0000000000000003 R14: 000000000000f042 R15: 0000000000000002
> [ 215.035641] FS: 0000000000000000(0000) GS:ffff880035a00000(0000) knlGS:0000000000000000
> [ 215.035641] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 215.035641] CR2: 0000000000bc5958 CR3: 00000000278b0000 CR4: 00000000000406e0
> [ 215.035641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 215.035641] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 215.035641] Process swapper/3 (pid: 0, threadinfo ffff88000d5b4000, task ffff88000d5b8000)
> [ 215.035641] Stack:
> [ 215.035641] ffff88000d5b5d48 ffffffff81095e0d ffff88000d5b5d58 0000000000000086
> [ 215.035641] 0000000000000002 0008000000000008 ffff88000d5b5d68 0000000000002710
> [ 215.035641] 000000000001b50e 0000000000000001 ffffffff84966008 0000000000000086
> [ 215.035641] Call Trace:
> [ 215.035641] [<ffffffff81095e0d>] __x2apic_send_IPI_mask+0xcd/0x190
> [ 215.035641] [<ffffffff81095ee7>] x2apic_send_IPI_all+0x17/0x20
> [ 215.035641] [<ffffffff810918c3>] arch_trigger_all_cpu_backtrace+0x63/0xa0
> [ 215.035641] [<ffffffff81194513>] print_other_cpu_stall+0x223/0x290
> [ 215.035641] [<ffffffff81194600>] __rcu_pending+0x80/0x200
> [ 215.035641] [<ffffffff8110e1d8>] ? __atomic_notifier_call_chain+0xe8/0x120
> [ 215.035641] [<ffffffff811947bf>] rcu_pending+0x3f/0x70
> [ 215.035641] [<ffffffff811949c4>] rcu_prepare_for_idle+0x1d4/0x430
> [ 215.035641] [<ffffffff81194e6c>] rcu_idle_enter_common+0x24c/0x400
> [ 215.035641] [<ffffffff8110e1f5>] ? __atomic_notifier_call_chain+0x105/0x120
> [ 215.035641] [<ffffffff8110e0f0>] ? atomic_notifier_chain_unregister+0x90/0x90
> [ 215.035641] [<ffffffff8119530b>] rcu_idle_enter+0xdb/0x130
> [ 215.035641] [<ffffffff81074636>] cpu_idle+0x106/0x160
> [ 215.035641] [<ffffffff835d1bb0>] start_secondary+0xd0/0xd7
> [ 215.035641] Code: 00 55 89 f9 48 89 e5 0f 32 45 31 c0 89 c7 48 89 d0 44 89 06 48 c1 e0 20 89 f9 5d 48 09 c8 c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 5d 48
> [ 214.997077] NMI backtrace for cpu 0
> [ 214.997077] CPU 0
> [ 214.997077] Pid: 2277, comm: kworker/0:2 Tainted: G W 3.5.0-sasha-01644-g824681b #267
> [ 214.997077] RIP: 0010:[<ffffffff8109bd5a>] [<ffffffff8109bd5a>] native_write_msr_safe+0xa/0x10
> [ 214.997077] RSP: 0018:ffff88000d803f70 EFLAGS: 00000046
> [ 214.997077] RAX: 0000000000000000 RBX: 000000000000ef80 RCX: 000000000000080b
> [ 214.997077] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000080b
> [ 214.997077] RBP: ffff88000d803f70 R08: 0000000000000000 R09: 0000000000000000
> [ 214.997077] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> [ 214.997077] R13: ffffffff84854840 R14: 0000000000000286 R15: 00000000000493e0
> [ 214.997077] FS: 0000000000000000(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000
> [ 214.997077] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 214.997077] CR2: 0000000000000000 CR3: 0000000012f58000 CR4: 00000000000406f0
> [ 214.997077] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 214.997077] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 214.997077] Process kworker/0:2 (pid: 2277, threadinfo ffff88000d782000, task ffff88000d653000)
> [ 214.997077] Stack:
> [ 214.997077] ffff88000d803f88 ffffffff81095d07 000000000000ef80 ffff88000d803fa8
> [ 214.997077] ffffffff810901c4 0000000000000002 ffff88000d653000 ffff88000d783cf0
> [ 214.997077] ffffffff8369c4af ffff88000d783c48 <EOI> ffff88000d783cf0 ffffffff8369aeb4
> [ 214.997077] Call Trace:
> [ 214.997077] <IRQ>
> [ 214.997077] [<ffffffff81095d07>] native_apic_msr_eoi_write+0x27/0x30
> [ 214.997077] [<ffffffff810901c4>] smp_apic_timer_interrupt+0x34/0xa0
> [ 214.997077] [<ffffffff8369c4af>] apic_timer_interrupt+0x6f/0x80
> [ 214.997077] <EOI>
> [ 214.997077] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13
> [ 214.997077] [<ffffffff8114b46d>] ? lock_is_held+0xed/0x110
> [ 214.997077] [<ffffffff8313ed87>] rt_check_expire+0x117/0x300
> [ 214.997077] [<ffffffff8313ef79>] rt_worker_func+0x9/0x20
> [ 214.997077] [<ffffffff810fa936>] process_one_work+0x376/0x700
> [ 214.997077] [<ffffffff810fa7f0>] ? process_one_work+0x230/0x700
> [ 214.997077] [<ffffffff8313ef70>] ? rt_check_expire+0x300/0x300
> [ 214.997077] [<ffffffff810fb508>] ? worker_thread+0x48/0x320
> [ 214.997077] [<ffffffff810fb69c>] worker_thread+0x1dc/0x320
> [ 214.997077] [<ffffffff810fb4c0>] ? manage_workers.isra.11+0x120/0x120
> [ 214.997077] [<ffffffff8110607d>] kthread+0xad/0xc0
> [ 214.997077] [<ffffffff8369cdb4>] kernel_thread_helper+0x4/0x10
> [ 214.997077] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13
> [ 214.997077] [<ffffffff81105fd0>] ? kthread_flush_work_fn+0x10/0x10
> [ 214.997077] [<ffffffff8369cdb0>] ? gs_change+0x13/0x13
> [ 214.997077] Code: 00 55 89 f9 48 89 e5 0f 32 45 31 c0 89 c7 48 89 d0 44 89 06 48 c1 e0 20 89 f9 5d 48 09 c8 c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 5d 48
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/