Re: [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading
From: Boqun Feng
Date: Wed Oct 13 2021 - 12:08:07 EST
Hi Frederic,
On Mon, Oct 11, 2021 at 04:51:32PM +0200, Frederic Weisbecker wrote:
> On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
> some work, such as callbacks acceleration and invocation, may be left
> unattended due to the volatile checks on the offloaded state.
>
> In the worst case this work is postponed until the next rcu_pending()
> check that can take a jiffy to reach, which can be a problem in case
> of callbacks flooding.
>
> Solve that with invoking rcu_core() early in the de-offloading process.
> This way any work dismissed by an ongoing rcu_core() call fooled by
> a preempting deoffloading process will be caught up by a nearby future
> recall to rcu_core(), this time fully aware of the de-offloading state.
>
> Tested-by: Valentin Schneider <valentin.schneider@xxxxxxx>
> Tested-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> Cc: Valentin Schneider <valentin.schneider@xxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
> Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
> Cc: Neeraj Upadhyay <neeraju@xxxxxxxxxxxxxx>
> Cc: Uladzislau Rezki <urezki@xxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---
> include/linux/rcu_segcblist.h | 14 ++++++++++++++
> kernel/rcu/rcu_segcblist.c | 6 ++----
> kernel/rcu/tree.c | 17 +++++++++++++++++
> kernel/rcu/tree_nocb.h | 9 +++++++++
> 4 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> index 812961b1d064..659d13a7ddaa 100644
> --- a/include/linux/rcu_segcblist.h
> +++ b/include/linux/rcu_segcblist.h
> @@ -136,6 +136,20 @@ struct rcu_cblist {
> * |--------------------------------------------------------------------------|
> * | SEGCBLIST_RCU_CORE | |
> * | SEGCBLIST_LOCKING | |
> + * | SEGCBLIST_OFFLOADED | |
> + * | SEGCBLIST_KTHREAD_CB | |
> + * | SEGCBLIST_KTHREAD_GP |
> + * | |
> + * | CB/GP kthreads handle callbacks holding nocb_lock, local rcu_core() |
> + * | handles callbacks concurrently. Bypass enqueue is enabled. |
> + * | Invoke RCU core so we make sure not to preempt it in the middle with |
> + * | leaving some urgent work unattended within a jiffy. |
> + * ----------------------------------------------------------------------------
> + * |
> + * v
> + * |--------------------------------------------------------------------------|
> + * | SEGCBLIST_RCU_CORE | |
> + * | SEGCBLIST_LOCKING | |
> * | SEGCBLIST_KTHREAD_CB | |
> * | SEGCBLIST_KTHREAD_GP |
> * | |
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index c07aab6e39ef..81145c3ece25 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -265,12 +265,10 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
> */
> void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
> {
> - if (offload) {
> + if (offload)
> rcu_segcblist_set_flags(rsclp, SEGCBLIST_LOCKING | SEGCBLIST_OFFLOADED);
> - } else {
> - rcu_segcblist_set_flags(rsclp, SEGCBLIST_RCU_CORE);
> + else
> rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED);
> - }
> }
>
> /*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index e38028d48648..b236271b9022 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2717,6 +2717,23 @@ static __latent_entropy void rcu_core(void)
> unsigned long flags;
> struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
> struct rcu_node *rnp = rdp->mynode;
> + /*
> + * On RT rcu_core() can be preempted when IRQs aren't disabled.
> + * Therefore this function can race with concurrent NOCB (de-)offloading
> + * on this CPU and the below condition must be considered volatile.
> + * However if we race with:
> + *
> + * _ Offloading: In the worst case we accelerate or process callbacks
> + * concurrently with NOCB kthreads. We are guaranteed to
> + * call rcu_nocb_lock() if that happens.
If offloading races with rcu_core(), can the following happen?
<offload work>
rcu_nocb_rdp_offload():
rcu_core():
...
rcu_nocb_lock_irqsave(); // no a lock
raw_spin_lock_irqsave(->nocb_lock);
rdp_offload_toggle():
<LOCKING | OFFLOADED set>
if (!rcu_segcblist_restempty(...))
rcu_accelerate_cbs_unlocked(...);
rcu_nocb_unlock_irqrestore();
// ^ a real unlock,
// and will preempt_enable()
// offload continue with ->nocb_lock not held
If this can happen, it means an unpaired preempt_enable() and an
incorrect unlock. Thoughts? Maybe I'm missing something here?
Regards,
Boqun
> + *
> + * _ Deoffloading: In the worst case we miss callbacks acceleration or
> + * processing. This is fine because the early stage
> + * of deoffloading invokes rcu_core() after setting
> + * SEGCBLIST_RCU_CORE. So we guarantee that we'll process
> + * what could have been dismissed without the need to wait
> + * for the next rcu_pending() check in the next jiffy.
> + */
> const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist);
>
> if (cpu_is_offline(smp_processor_id()))
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 71a28f50b40f..3b470113ae38 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> * will refuse to put anything into the bypass.
> */
> WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> + /*
> + * Start with invoking rcu_core() early. This way if the current thread
> + * happens to preempt an ongoing call to rcu_core() in the middle,
> + * leaving some work dismissed because rcu_core() still thinks the rdp is
> + * completely offloaded, we are guaranteed a nearby future instance of
> + * rcu_core() to catch up.
> + */
> + rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
> + invoke_rcu_core();
> ret = rdp_offload_toggle(rdp, false, flags);
> swait_event_exclusive(rdp->nocb_state_wq,
> !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
> --
> 2.25.1
>