Re: rcu: frequent rcu lockups

From: Sasha Levin
Date: Thu Mar 12 2015 - 08:28:50 EST


On 03/11/2015 07:16 PM, Paul E. McKenney wrote:
> On Wed, Mar 11, 2015 at 07:06:40PM -0400, Sasha Levin wrote:
>> > On 03/11/2015 07:01 PM, Paul E. McKenney wrote:
>>>> > >> With the commit I didn't hit it yet, but I do see 4 different WARNings:
>>> > > I wish that I could say that I am surprised, but the sad fact is that
>>> > > I am still shaking the bugs out.
>> >
>> > I have one more to add:
>> >
>> > [ 93.330539] WARNING: CPU: 1 PID: 8 at kernel/rcu/tree_plugin.h:476 rcu_gp_kthread+0x1eaa/0x4dd0()
> A bit different, but still in the class of a combining-tree bitmask
> handling bug.

I left it overnight, and am still seeing hangs. Although (and don't catch me
by that) it seems to be significantly less of them.

[ 4423.001809] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 4423.001809] Tasks blocked on level-1 rcu_node (CPUs 16-31):
[ 4423.001809] (detected by 0, t=30502 jiffies, g=60989, c=60988, q=18648)
[ 4423.001809] All QSes seen, last rcu_preempt kthread activity 1 (4295375352-4295375351), jiffies_till_next_fqs=1, root ->qsmask 0x2
[ 4423.001809] trinity-c0 R running task 27480 15862 9833 0x10080000
[ 4423.001809] 0000000000002669 00000000ac401e1d ffff880050607de8 ffffffff9327679b
[ 4423.001809] ffff880050607db8 ffffffffa0b36000 0000000000000001 00000001000639f7
[ 4423.001809] ffffffffa0b351c8 dffffc0000000000 ffff880050622000 ffffffffa0721000
[ 4423.001809] Call Trace:
[ 4423.001809] <IRQ> sched_show_task (kernel/sched/core.c:4542)
[ 4423.001809] rcu_check_callbacks (kernel/rcu/tree.c:1225 kernel/rcu/tree.c:1331 kernel/rcu/tree.c:3400 kernel/rcu/tree.c:3464 kernel/rcu/tree.c:2682)
[ 4423.001809] ? acct_account_cputime (kernel/tsacct.c:168)
[ 4423.001809] update_process_times (./arch/x86/include/asm/preempt.h:22 kernel/time/timer.c:1386)
[ 4423.001809] tick_periodic (kernel/time/tick-common.c:92)
[ 4423.001809] ? tick_handle_periodic (kernel/time/tick-common.c:105)
[ 4423.001809] tick_handle_periodic (kernel/time/tick-common.c:105)
[ 4423.001809] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:891)
[ 4423.001809] ? irq_enter (kernel/softirq.c:338)
[ 4423.001809] smp_apic_timer_interrupt (./arch/x86/include/asm/apic.h:650 arch/x86/kernel/apic/apic.c:915)
[ 4423.001809] apic_timer_interrupt (arch/x86/kernel/entry_64.S:920)
[ 4423.001809] <EOI> ? remove_wait_queue (include/linux/wait.h:145 kernel/sched/wait.c:50)
[ 4423.001809] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:162 kernel/locking/spinlock.c:191)
[ 4423.001809] remove_wait_queue (kernel/sched/wait.c:52)
[ 4423.001809] do_wait (kernel/exit.c:1465 (discriminator 1))
[ 4423.001809] ? wait_consider_task (kernel/exit.c:1465)
[ 4423.001809] ? find_get_pid (kernel/pid.c:490)
[ 4423.001809] SyS_wait4 (kernel/exit.c:1618 kernel/exit.c:1586)
[ 4423.001809] ? SyS_waitid (kernel/exit.c:1586)
[ 4423.001809] ? kill_orphaned_pgrp (kernel/exit.c:1444)
[ 4423.001809] ? syscall_trace_enter_phase2 (arch/x86/kernel/ptrace.c:1592)
[ 4423.001809] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
[ 4423.001809] tracesys_phase2 (arch/x86/kernel/entry_64.S:347)


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/