Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm

From: Paul E. McKenney
Date: Wed Feb 24 2021 - 13:37:55 EST


On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
> Two situations can cause a missed nocb timer rearm:
>
> 1) rdp(CPU A) queues its nocb timer. The grace period elapses before
> the timer get a chance to fire. The nocb_gp kthread is awaken by
> rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
> process the callbacks, again before the nocb_timer for CPU A get a
> chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
> kthread, cancelling the pending nocb_timer without resetting the
> corresponding nocb_defer_wakeup.

As discussed offlist, expanding the above scenario results in this
sequence of steps:

1. There are no callbacks queued for any CPU covered by CPU 0-2's
->nocb_gp_kthread.

2. CPU 0 enqueues its first callback with interrupts disabled, and
thus must defer awakening its ->nocb_gp_kthread. It therefore
queues its rcu_data structure's ->nocb_timer.

3. CPU 1, which shares the same ->nocb_gp_kthread, also enqueues a
callback, but with interrupts enabled, allowing it to directly
awaken the ->nocb_gp_kthread.

4. The newly awakened ->nocb_gp_kthread associates both CPU 0's
and CPU 1's callbacks with a future grace period and arranges
for that grace period to be started.

5. This ->nocb_gp_kthread goes to sleep waiting for the end of this
future grace period.

6. This grace period elapses before the CPU 0's timer fires.
This is normally improbably given that the timer is set for only
one jiffy, but timers can be delayed. Besides, it is possible
that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

7. The grace period ends, so rcu_gp_kthread awakens the
->nocb_gp_kthread, which in turn awakens both CPU 0's and
CPU 1's ->nocb_cb_kthread.

8. CPU 0's ->nocb_cb_kthread invokes its callback.

9. Note that neither kthread updated any ->nocb_timer state,
so CPU 0's ->nocb_defer_wakeup is still set to either
RCU_NOCB_WAKE or RCU_NOCB_WAKE_FORCE.

10. CPU 0 enqueues its second callback, again with interrupts
disabled, and thus must again defer awakening its
->nocb_gp_kthread. However, ->nocb_defer_wakeup prevents
CPU 0 from queueing the timer.

So far so good, but why isn't the timer still queued from back in step 2?
What am I missing here? Either way, could you please update the commit
logs to tell the full story? At some later time, you might be very
happy that you did. ;-)

Thanx, Paul

> 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
> the pending "nocb_timer" (note they are not the same timers) for the
> given rdp without resetting the matching state stored in nocb_defer
> wakeup.
>
> On both situations, a future call_rcu() on that rdp may be fooled and
> think the timer is armed when it's not, missing a deferred nocb_gp
> wakeup.
>
> Case 1) is very unlikely due to timing constraint (the timer fires after
> 1 jiffy) but still possible in theory. Case 2) is more likely to happen.
> But in any case such scenario require the CPU to spend a long time
> within a kernel thread without exiting to idle or user space, which is
> a pretty exotic behaviour.
>
> Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the
> timer.
>
> Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
> Cc: Stable <stable@xxxxxxxxxxxxxxx>
> Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx>
> Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
> Cc: Neeraj Upadhyay <neeraju@xxxxxxxxxxxxxx>
> Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> ---
> kernel/rcu/tree_plugin.h | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 2ec9d7f55f99..dd0dc66c282d 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1720,7 +1720,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> rcu_nocb_unlock_irqrestore(rdp, flags);
> return false;
> }
> - del_timer(&rdp->nocb_timer);
> +
> + if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> + WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> + del_timer(&rdp->nocb_timer);
> + }
> rcu_nocb_unlock_irqrestore(rdp, flags);
> raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
> @@ -2349,7 +2353,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
> return false;
> }
> ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> - WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
> trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
>
> --
> 2.25.1
>