Re: 4.2-rc5 rcu stalls.

From: Frederic Weisbecker
Date: Wed Aug 05 2015 - 08:38:08 EST


On Tue, Aug 04, 2015 at 08:12:50PM -0400, Dave Jones wrote:
> On Tue, Aug 04, 2015 at 12:54:35AM -0400, Sasha Levin wrote:
> > On 08/03/2015 06:03 PM, Paul E. McKenney wrote:
> > >> > Ugh, that doesn't revert cleanly. Got something handy ?
> > > I do not, but perhaps either Sasha or Frederic do.
> >
> > I've attached a revert courtesy of Peter.
>
> Thanks. At first I thought this was doing the trick, but then I hit this again.
>
>
> [23643.545873] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [23643.546031] Tasks blocked on level-0 rcu_node (CPUs 0-3): P31722
> [23643.546173] (detected by 3, t=65002 jiffies, g=2256887, c=2256886, q=0)
> [23643.546326] trinity-watchdo R running task 14336 31722 31721 0x00080000
> [23643.546488] ffff8804fcfe7cc8 000000000000ded0 0000000000000002 ffff8804f58bb680
> [23643.546661] ffff8800ce4951c0 ffff8804fcfe7cb8 ffff8804fcfe8000 ffff8804f6552608
> [23643.546830] 0000000000000009 ffff8804fcfe7e88 0000000000000009 ffff8804fcfe7ce8
> [23643.547001] Call Trace:
> [23643.547058] [<ffffffff887fa2b2>] preempt_schedule_common+0x22/0x40
> [23643.547201] [<ffffffff887fa2ef>] preempt_schedule+0x1f/0x30
> [23643.547329] [<ffffffff88001058>] ___preempt_schedule+0x12/0x14
> [23643.547465] [<ffffffff8808b76d>] ? do_send_sig_info+0x5d/0x80
> [23643.547599] [<ffffffff887fff32>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> [23643.547753] [<ffffffff887fff50>] ? _raw_spin_unlock_irqrestore+0x60/0x70
> [23643.547910] [<ffffffff8808b76d>] do_send_sig_info+0x5d/0x80
> [23643.548039] [<ffffffff8808be62>] group_send_sig_info+0xb2/0x120
> [23643.548175] [<ffffffff8808bdb5>] ? group_send_sig_info+0x5/0x120
> [23643.548314] [<ffffffff880ea62f>] ? rcu_read_lock_held+0x4f/0x60
> [23643.548451] [<ffffffff8808c05f>] kill_pid_info+0x7f/0x150
> [23643.548576] [<ffffffff8808c000>] ? kill_pid_info+0x20/0x150
> [23643.548705] [<ffffffff8808c244>] SYSC_kill+0xf4/0x2b0
> [23643.548821] [<ffffffff8808c1ed>] ? SYSC_kill+0x9d/0x2b0
> [23643.548942] [<ffffffff880d35cb>] ? trace_hardirqs_on_caller+0x14b/0x1e0
> [23643.549097] [<ffffffff880d366d>] ? trace_hardirqs_on+0xd/0x10
> [23643.549231] [<ffffffff88192f63>] ? context_tracking_user_exit+0x13/0x20
> [23643.549387] [<ffffffff88012c47>] ? syscall_trace_enter_phase1+0xf7/0x150
> [23643.549540] [<ffffffff88001017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [23643.549687] [<ffffffff8808e64e>] SyS_kill+0xe/0x10
> [23643.549799] [<ffffffff88800997>] entry_SYSCALL_64_fastpath+0x12/0x6f

If it still happens after Sasha's revert, which basically revert all the offending
patches related to preempt lately, then the reason might be elsewhere.

How hard was it to reproduce? I see 23000 secs in your dmesg logs which is around 6 hours.

Also did you just launch trinity? no specific options?
I'll try to reproduce that.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/