Re: net, phonet, rcu: rcu hang within gprs_attach
From: Sasha Levin
Date: Fri Jul 25 2014 - 19:23:21 EST
On 07/25/2014 07:19 PM, Paul E. McKenney wrote:
> On Thu, Jul 24, 2014 at 07:28:35PM -0400, Sasha Levin wrote:
>> > On 07/24/2014 06:54 PM, Paul E. McKenney wrote:
>>> > > On Thu, Jul 24, 2014 at 06:19:11PM -0400, Sasha Levin wrote:
>>>> > >> Hi all,
>>>> > >>
>>>> > >> While fuzzing with trinity inside a KVM tools guest running the latest -next
>>>> > >> kernel I've stumbled on the following stack trace (full log attached):
>>>> > >>
>>>> > >> [ 370.662014] INFO: task trinity-main:8727 blocked for more than 120 seconds.
>>>> > >> [ 370.662891] Not tainted 3.16.0-rc6-next-20140724-sasha-00046-g7324c87-dirty #932
>>>> > >> [ 370.663655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> > >> [ 370.664562] trinity-main D ffff88053cc80000 13064 8727 8714 0x00000000
>>>> > >> [ 370.665328] ffff88053da6fc10 0000000000000002 ffff8805483e2dc8 ffff880541873000
>>>> > >> [ 370.666147] 000000276ed30787 ffff88053da6c010 ffff88053da6c000 ffff8805452a0000
>>>> > >> [ 370.667243] ffff880541873000 0000000000000000 7fffffffffffffff ffffffffb3ec51d8
>>>> > >> [ 370.668788] Call Trace:
>>>> > >> [ 370.669118] schedule (kernel/sched/core.c:2847)
>>>> > >> [ 370.670538] schedule_timeout (kernel/time/timer.c:1476)
>>>> > >> [ 370.671524] ? mark_lock (kernel/locking/lockdep.c:2894)
>>>> > >> [ 370.672299] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
>>>> > >> [ 370.673227] ? get_parent_ip (kernel/sched/core.c:2561)
>>>> > >> [ 370.674085] wait_for_completion (include/linux/spinlock.h:328 kernel/sched/completion.c:76 kernel/sched/completion.c:93 kernel/sched/completion.c:101 kernel/sched/completion.c:122)
>>>> > >> [ 370.674960] ? wake_up_state (kernel/sched/core.c:2942)
>>>> > >> [ 370.675576] _rcu_barrier (kernel/rcu/tree.c:3325 (discriminator 8))
>>>> > >> [ 370.676109] rcu_barrier (kernel/rcu/tree_plugin.h:920)
>>>> > >> [ 370.676627] netdev_run_todo (net/core/dev.c:6323)
>>>> > >> [ 370.677202] rtnl_unlock (net/core/rtnetlink.c:80)
>>>> > >> [ 370.677714] unregister_netdev (net/core/dev.c:6687)
>>>> > >> [ 370.678266] gprs_attach (net/phonet/pep-gprs.c:311)
>>>> > >> [ 370.679641] pep_setsockopt (net/phonet/pep.c:1016)
>>>> > >> [ 370.681082] sock_common_setsockopt (net/core/sock.c:2603)
>>>> > >> [ 370.682048] SyS_setsockopt (net/socket.c:1914 net/socket.c:1894)
>>>> > >> [ 370.682854] tracesys (arch/x86/kernel/entry_64.S:541)
>>>> > >> [ 370.683586] 1 lock held by trinity-main/8727:
>>>> > >> [ 370.684232] #0: (rcu_preempt_state.barrier_mutex){+.+...}, at: _rcu_barrier (kernel/rcu/tree.c:3233)
>>>> > >>
>>>> > >> This has reproduced couple of times, and has always originated from gprs_attach. I don't see any obvious
>>>> > >> issues with the code there, so I'm not sure if it's a fault of the phonet or the rcu code.
>>> > >
>>> > > Can't tell much from this. Any chance of a .config?
>>> > >
>>> > > Thanx, Paul
>>> > >
>> >
>> > Attached.
> If you were doing partial nohz_full= CPUs, there is a recent RCU bug
> that would result in these symptoms. No idea how you would make it
> happen without specifying the nohz_full= boot parameter, but I should
> be getting the fix into -next in a few days.
>
> But you never know. So if you are interested in testing sooner, and if
> my local tests pass, I could send you a modified patch that applies on
> top of rcu/next. If you would like such a patch, let me know.
Sure, if you Cc me on it I'll be happy to test it out, just don't go out
of your way since I've disabled phonet for now anyways, so it's not really
delaying me.
Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/