Re: rcu_read_lock() used illegally while idle!

From: Dave Jones
Date: Tue May 28 2013 - 16:15:05 EST


On Sat, May 25, 2013 at 02:20:04PM -0400, Steven Rostedt wrote:
> On Sat, 2013-05-25 at 09:59 -0400, Dave Jones wrote:
> > On Fri, May 24, 2013 at 03:23:40PM -0400, Steven Rostedt wrote:
> > > On Fri, 2013-05-24 at 10:23 -0400, Dave Jones wrote:
> > >
> > > > .config: http://paste.fedoraproject.org/14281/94052971/raw/
> > > >
> > > > trace shows the problem process was 'cc1', so I was likely building a kernel
> > > > at the time. There was also a trinity run going on in the background.
> > > >
> > > > cmdline: nothing special..
> > > >
> > > > BOOT_IMAGE=/vmlinuz-3.10.0-rc2+ root=UUID=bee21cd9-1852-4d1d-9e9d-2e44332b8df1 ro rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 consoleblank=0 console=ttyUSB0,115200 console=tty0 pause_on_oops=30 audit=0
> > > >
> > >
> > > OK, I'm able to reproduce this. Looks like the same issue happens from
> > > other ways from userspace into the kernel (the next way was tracing
> > > system calls).
> > >
> > > Forget the other patch. Here's a new patch that is more specific to
> > > tracing and context tracking.
> > >
> > > Can you try this one out, please.
> >
> > Took a lot longer to hit this..
>
> This is a same but different bug ;-)
>
> Looks like we fixed all the function tracing infrastructure problems,
> but this is a function tracer user problem. Namely perf.

another variant of the same ? or different different ?

[12572.705832] ======================================================
[12572.750317] [ INFO: possible circular locking dependency detected ]
[12572.796978] 3.10.0-rc3+ #39 Not tainted
[12572.833381] -------------------------------------------------------
[12572.862233] trinity-child17/31341 is trying to acquire lock:
[12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
[12572.878859]
but task is already holding lock:
[12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
[12572.903381]
which lock already depends on the new lock.

[12572.927541]
the existing dependency chain (in reverse order) is:
[12572.943736]
-> #4 (&ctx->lock){-.-...}:
[12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
[12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
[12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
[12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
[12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70
[12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e
[12573.018321]
-> #3 (&rq->lock){-.-.-.}:
[12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
[12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
[12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470
[12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30
[12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140
[12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
[12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
[12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
[12573.109528]
-> #2 (&p->pi_lock){-.-.-.}:
[12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
[12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
[12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
[12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
[12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90
[12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50
[12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
[12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
[12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
[12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100
[12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
[12573.221379]
-> #1 (&rsp->gp_wq){..-.-.}:
[12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
[12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50
[12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
[12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
[12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
[12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100
[12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
[12573.295067]
-> #0 (rcu_node_0){..-.-.}:
[12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
[12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
[12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
[12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
[12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
[12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
[12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
[12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
[12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
[12573.382068]
other info that might help us debug this:

[12573.403229] Chain exists of:
rcu_node_0 --> &rq->lock --> &ctx->lock

[12573.424471] Possible unsafe locking scenario:

[12573.438499] CPU0 CPU1
[12573.445599] ---- ----
[12573.452691] lock(&ctx->lock);
[12573.459799] lock(&rq->lock);
[12573.467010] lock(&ctx->lock);
[12573.474192] lock(rcu_node_0);
[12573.481262]
*** DEADLOCK ***

[12573.501931] 1 lock held by trinity-child17/31341:
[12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
[12573.516475]
stack backtrace:
[12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
[12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
[12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
[12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
[12573.567856] Call Trace:
[12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b
[12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
[12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
[12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
[12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
[12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
[12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
[12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
[12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
[12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
[12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
[12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
[12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
[12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
[12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
[12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
[12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
[12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
[12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
[12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/