Re: rcu: WARNING in rcu_seq_end
From: Paul E. McKenney
Date: Sat Mar 04 2017 - 20:15:20 EST
On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote:
> Hello,
>
> Paul, you wanted bugs in rcu.
Well, whether I want them or not, I must deal with them. ;-)
> I've got this WARNING while running syzkaller fuzzer on
> 86292b33d4b79ee03e2f43ea0381ef85f077c760:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533
> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Workqueue: events wait_rcu_exp_gp
> Call Trace:
> __dump_stack lib/dump_stack.c:15 [inline]
> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> panic+0x1fb/0x412 kernel/panic.c:179
> __warn+0x1c4/0x1e0 kernel/panic.c:540
> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
> rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline]
> rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517
> rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline]
> wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570
> process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096
> worker_thread+0x223/0x19c0 kernel/workqueue.c:2230
> kthread+0x326/0x3f0 kernel/kthread.c:227
> ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> Not reproducible. But looking at the code, shouldn't it be:
>
> static void rcu_seq_end(unsigned long *sp)
> {
> smp_mb(); /* Ensure update-side operation before counter increment. */
> + WARN_ON_ONCE(!(*sp & 0x1));
> WRITE_ONCE(*sp, *sp + 1);
> - WARN_ON_ONCE(*sp & 0x1);
> }
>
> ?
>
> Otherwise wait_event in _synchronize_rcu_expedited can return as soon
> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this
> consequently can allow start of next grace periods. Which in turn can
> make the warning fire. Am I missing something?
>
> I don't see any other bad consequences of this. The rest of
> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has
> returned and destroyed work on stack and next period has started and
> ended, but it seems OK.
I believe that this is a heygood change, but I don't see how it will
help in this case. BTW, may I have your Signed-off-by?
The reason I don't believe that it will help is that the
rcu_exp_gp_seq_end() function is called from a workqueue handler that
is invoked holding ->exp_mutex, and this mutex is not released until
after the handler invokes rcu_seq_end() and then wakes up the task that
scheduled the workqueue handler. So the ordering above should not matter
(but I agree that your ordering is cleaner.
That said, it looks like I am missing some memory barriers, please
see the following patch.
But what architecture did you see this on?
Thanx, Paul
------------------------------------------------------------------------
commit 1038d97908afb04750d0775748e2165a6e92d851
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Date: Sat Mar 4 12:33:53 2017 -0800
rcu: Expedited wakeups need to be fully ordered
Expedited grace periods use workqueue handlers that wake up the requesters,
but there is no lock mediating this wakeup. Therefore, memory barriers
are required to ensure that the handler's memory references are seen by
all to occur before synchronize_*_expedited() returns to its caller.
Possibly detected by syzkaller.
Reported-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 51ca287828a2..027e123d93c7 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -533,6 +533,7 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
rnp->exp_seq_rq = s;
spin_unlock(&rnp->exp_lock);
}
+ smp_mb(); /* All above changes before wakeup. */
wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x3]);
}
trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
@@ -614,6 +615,7 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
wait_event(rnp->exp_wq[(s >> 1) & 0x3],
sync_exp_work_done(rsp,
&rdp->exp_workdone0, s));
+ smp_mb(); /* Workqueue actions happen before return. */
/* Let the next expedited grace period start. */
mutex_unlock(&rsp->exp_mutex);