Re: [PATCH 05/11] rcu: clean all rcu_read_unlock_special after report qs

From: Paul E. McKenney
Date: Fri Nov 01 2019 - 07:54:58 EST


On Thu, Oct 31, 2019 at 10:08:00AM +0000, Lai Jiangshan wrote:
> rcu_preempt_deferred_qs_irqrestore() is always called when
> ->rcu_read_lock_nesting <= 0 and there is nothing to prevent it
> from reporting qs if needed which means ->rcu_read_unlock_special
> is better to be clearred after the function. But in some cases,
> it leaves exp_hint (for example, after rcu_note_context_switch()),
> which is harmess since it is just a hint, but it may also intruduce
> unneeded rcu_read_unlock_special() later.
>
> The new code write to exp_hint without WRITE_ONCE() since the
> writing is protected by irq.
>
> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>

This does look good, thank you!

Though for a rather different reason that called out in the commit log.
Note that .exp_hint is in task_struct, and thus follows the task, and is
currently unconditionally cleared by the next rcu_read_unlock_special(),
which will be invoked because .exp_hint is non-zero. So if
rcu_note_context_switch() leaves .exp_hint set, that is all to the good
because the next rcu_read_unlock() executed by this task once resumed,
and because that rcu_read_unlock() might be transitioning to a quiescent
state required in order for the expedited grace period to end.

Nevertheless, clearing .exp_hint in rcu_preempt_deferred_qs_irqrestore()
instead of rcu_read_unlock_special() is a good thing. This is because
in the case where it is not possible to arrange an event immediately
following reenabling, it is good to enable any rcu_read_unlock()s that
might be executed to help us out.

Or am I missing something here?

> ---
> kernel/rcu/tree_plugin.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 9fe8138ed3c3..bb3bcdb5c9b8 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -439,6 +439,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> * t->rcu_read_unlock_special cannot change.
> */
> special = t->rcu_read_unlock_special;
> + t->rcu_read_unlock_special.b.exp_hint = false;

Interrupts are disabled by this time, so yes, it is safe for this to be
a simple assignment.

> t->rcu_read_unlock_special.b.deferred_qs = false;
> if (special.b.need_qs) {
> rcu_qs();
> @@ -626,7 +627,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
> local_irq_restore(flags);
> return;
> }
> - WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);

And reaching the above assignment is inevitable at this point,
so this assignment can go.

But can't we also get rid of the assignment under the preceding "if"
statement? Please see below for a version of your patch that includes
this proposed change along with an updated commit log. Please let me
know if I have messed anything up.

> rcu_preempt_deferred_qs_irqrestore(t, flags);
> }

------------------------------------------------------------------------

commit 6e3f2b1d6aba24ad6ae8deb2ce98b93d527227ae
Author: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
Date: Fri Nov 1 04:06:22 2019 -0700

rcu: Clear .exp_hint only when deferred quiescent state has been reported

Currently, the .exp_hint flag is cleared in rcu_read_unlock_special(),
which works, but which can also prevent subsequent rcu_read_unlock() calls
from helping expedite the quiescent state needed by an ongoing expedited
RCU grace period. This commit therefore defers clearing of .exp_hint
from rcu_read_unlock_special() to rcu_preempt_deferred_qs_irqrestore(),
thus ensuring that intervening calls to rcu_read_unlock() have a chance
to help end the expedited grace period.

Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d4c4824..677ee1c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -439,6 +439,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
* t->rcu_read_unlock_special cannot change.
*/
special = t->rcu_read_unlock_special;
+ t->rcu_read_unlock_special.b.exp_hint = false;
rdp = this_cpu_ptr(&rcu_data);
if (!special.s && !rdp->exp_deferred_qs) {
local_irq_restore(flags);
@@ -610,7 +611,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;

- t->rcu_read_unlock_special.b.exp_hint = false;
exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
(rdp->grpmask & rnp->expmask) ||
tick_nohz_full_cpu(rdp->cpu);
@@ -640,7 +640,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
local_irq_restore(flags);
return;
}
- WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
rcu_preempt_deferred_qs_irqrestore(t, flags);
}