Re: [GIT PULL rcu/next] RCU commits for 4.6
From: Paul E. McKenney
Date: Tue Mar 08 2016 - 13:18:21 EST
On Tue, Mar 08, 2016 at 07:21:09AM -0800, Paul E. McKenney wrote:
> On Tue, Mar 08, 2016 at 09:53:42AM +0100, Ingo Molnar wrote:
> > * Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
[ . . . ]
> > Pulled, thanks a lot Paul!
> >
> > So I've done the conflict resolutions with tmp:smp/hotplug and tip:sched/core
> > myself, and came up with a mostly identical resolution, except this difference
> > with your resolution in wagi.2016.03.01a:
> >
> > --- linux-next/kernel/rcu/tree.c
> > +++ tip/kernel/rcu/tree.c
> > @@ -2046,8 +2046,8 @@ static void rcu_gp_cleanup(struct rcu_st
> > /* smp_mb() provided by prior unlock-lock pair. */
> > nocb += rcu_future_gp_cleanup(rsp, rnp);
> > sq = rcu_nocb_gp_get(rnp);
> > - raw_spin_unlock_irq_rcu_node(rnp);
> > rcu_nocb_gp_cleanup(sq);
> > + raw_spin_unlock_irq_rcu_node(rnp);
> > cond_resched_rcu_qs();
> > WRITE_ONCE(rsp->gp_activity, jiffies);
> > rcu_gp_slow(rsp, gp_cleanup_delay);
> >
> > but your resolution is better, rcu_nocb_gp_cleanup() can (and should) be done
> > outside of the rcu_node lock.
> >
> > So we have the same resolution now, which is good! ;-)
>
> Glad we were close!
>
> Just for purposes of satisfying curiosity, I am running rcutorture on your
> version. ;-)
And for whatever it is worth, in one of the sixteen rcutorture scenarios
lockdep complained as shown below.
On the other hand, your version quite possibly makes a lost-wakeup bug
happen more frequently. If my current quest to create a torture test
specific to this bug fails, I will revisit your patch. So despite the
lockdep splat, it is quite possible that I will be thanking you for it
at some point. ;-)
Thanx, Paul
------------------------------------------------------------------------
[ 0.546319] =================================
[ 0.547000] [ INFO: inconsistent lock state ]
[ 0.547000] 4.5.0-rc6+ #1 Not tainted
[ 0.547000] ---------------------------------
[ 0.547000] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 0.547000] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 0.547000] (rcu_node_2){+.?...}, at: [<ffffffff810bd9e4>] rcu_process_callbacks+0xf4/0x860
[ 0.547000] {SOFTIRQ-ON-W} state was registered at:
[ 0.547000] [<ffffffff810a22c6>] mark_held_locks+0x66/0x90
[ 0.547000] [<ffffffff810a23e4>] trace_hardirqs_on_caller+0xf4/0x1c0
[ 0.547000] [<ffffffff810a24bd>] trace_hardirqs_on+0xd/0x10
[ 0.547000] [<ffffffff8197e987>] _raw_spin_unlock_irq+0x27/0x50
[ 0.547000] [<ffffffff8109cd36>] swake_up_all+0xb6/0xd0
[ 0.547000] [<ffffffff810bd375>] rcu_gp_kthread+0x835/0xaf0
[ 0.547000] [<ffffffff8107b04f>] kthread+0xdf/0x100
[ 0.547000] [<ffffffff8197f4ff>] ret_from_fork+0x3f/0x70
[ 0.547000] irq event stamp: 34721
[ 0.547000] hardirqs last enabled at (34720): [<ffffffff810baf33>] note_gp_changes+0x43/0xa0
[ 0.547000] hardirqs last disabled at (34721): [<ffffffff8197e767>] _raw_spin_lock_irqsave+0x17/0x60
[ 0.547000] softirqs last enabled at (34712): [<ffffffff8105d84c>] _local_bh_enable+0x1c/0x50
[ 0.547000] softirqs last disabled at (34713): [<ffffffff8105edb5>] irq_exit+0xa5/0xb0
[ 0.547000]
[ 0.547000] other info that might help us debug this:
[ 0.547000] Possible unsafe locking scenario:
[ 0.547000]
[ 0.547000] CPU0
[ 0.547000] ----
[ 0.547000] lock(rcu_node_2);
[ 0.547000] <Interrupt>
[ 0.547000] lock(rcu_node_2);
[ 0.547000]
[ 0.547000] *** DEADLOCK ***
[ 0.547000]
[ 0.547000] no locks held by swapper/0/0.
[ 0.547000]
[ 0.547000] stack backtrace:
[ 0.547000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.5.0-rc6+ #1
[ 0.547000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 0.547000] 0000000000000000 ffff88001fc03cc0 ffffffff813730ee ffffffff81e1d500
[ 0.547000] ffffffff83874a20 ffff88001fc03d10 ffffffff8113cf77 0000000000000001
[ 0.547000] ffffffff00000000 ffff880000000000 0000000000000006 ffffffff81e1d500
[ 0.547000] Call Trace:
[ 0.547000] <IRQ> [<ffffffff813730ee>] dump_stack+0x67/0x99
[ 0.547000] [<ffffffff8113cf77>] print_usage_bug+0x1f2/0x203
[ 0.547000] [<ffffffff810a1840>] ? check_usage_backwards+0x120/0x120
[ 0.547000] [<ffffffff810a21d2>] mark_lock+0x212/0x2a0
[ 0.547000] [<ffffffff810a2b17>] __lock_acquire+0x397/0x1b50
[ 0.547000] [<ffffffff8108ebce>] ? update_blocked_averages+0x3e/0x4a0
[ 0.547000] [<ffffffff8197e945>] ? _raw_spin_unlock_irqrestore+0x55/0x70
[ 0.547000] [<ffffffff810a2394>] ? trace_hardirqs_on_caller+0xa4/0x1c0
[ 0.547000] [<ffffffff8109625a>] ? rebalance_domains+0x10a/0x3b0
[ 0.547000] [<ffffffff810a4ae5>] lock_acquire+0xc5/0x1e0
[ 0.547000] [<ffffffff810bd9e4>] ? rcu_process_callbacks+0xf4/0x860
[ 0.547000] [<ffffffff8197e791>] _raw_spin_lock_irqsave+0x41/0x60
[ 0.547000] [<ffffffff810bd9e4>] ? rcu_process_callbacks+0xf4/0x860
[ 0.547000] [<ffffffff810bd9e4>] rcu_process_callbacks+0xf4/0x860
[ 0.547000] [<ffffffff810966c8>] ? run_rebalance_domains+0x1c8/0x1f0
[ 0.547000] [<ffffffff8105e7b9>] __do_softirq+0x139/0x490
[ 0.547000] [<ffffffff8105edb5>] irq_exit+0xa5/0xb0
[ 0.547000] [<ffffffff8103d2cd>] smp_apic_timer_interrupt+0x3d/0x50
[ 0.547000] [<ffffffff8197ff29>] apic_timer_interrupt+0x89/0x90
[ 0.547000] <EOI> [<ffffffff8100e738>] ? default_idle+0x18/0x1a0
[ 0.547000] [<ffffffff8100e736>] ? default_idle+0x16/0x1a0
[ 0.547000] [<ffffffff8100f11a>] arch_cpu_idle+0xa/0x10
[ 0.547000] [<ffffffff8109d035>] default_idle_call+0x25/0x40
[ 0.547000] [<ffffffff8109d2e8>] cpu_startup_entry+0x298/0x3c0
[ 0.547000] [<ffffffff8197768f>] rest_init+0x12f/0x140
[ 0.547000] [<ffffffff81977560>] ? csum_partial_copy_generic+0x170/0x170
[ 0.547000] [<ffffffff81f6ffd5>] start_kernel+0x435/0x442
[ 0.547000] [<ffffffff81f6f98e>] ? set_init_arg+0x55/0x55
[ 0.547000] [<ffffffff81f6f5ad>] x86_64_start_reservations+0x2a/0x2c
[ 0.547000] [<ffffffff81f6f699>] x86_64_start_kernel+0xea/0xed