Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT
From: Frederic Weisbecker
Date: Tue Sep 21 2021 - 19:36:36 EST
On Tue, Sep 21, 2021 at 11:12:50PM +0200, Thomas Gleixner wrote:
> Valentin reported warnings about suspicious RCU usage on RT kernels. Those
> happen when offloading of RCU callbacks is enabled:
>
> WARNING: suspicious RCU usage
> 5.13.0-rt1 #20 Not tainted
> -----------------------------
> kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!
>
> rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
> rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
> rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)
>
> The reason is that rcu_rdp_is_offloaded() is invoked without one of the
> required protections on RT enabled kernels because local_bh_disable() does
> not disable preemption on RT.
>
> Valentin proposed to add a local lock to the code in question, but that's
> suboptimal in several aspects:
>
> 1) local locks add extra code to !RT kernels for no value.
>
> 2) All possible callsites have to audited and amended when affected
> possible at an outer function level due to lock nesting issues.
>
> 3) As the local lock has to be taken at the outer functions it's required
> to release and reacquire them in the inner code sections which might
> voluntary schedule, e.g. rcu_do_batch().
>
> Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke
> rcu_rdp_is_offloaded() in the variable declaration section right at the top
> of the functions. But the actual usage of the result is either within a
> section which provides the required protections or after such a section.
>
> So the obvious solution is to move the invocation into the code sections
> which provide the proper protections, which solves the problem for RT and
> does not have any impact on !RT kernels.
You also need to consider rcu_segcblist_completely_offloaded(). There
are two users:
1) The first chunk using it in rcu_core() checks if there is a need to
accelerate the callback and that can happen concurrently with nocb
manipulations on the cblist. Concurrent (de-)offloading could mess
up with the locking state but here is what we can do:
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bce848e50512..3e56a1a4af03 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2728,9 +2728,10 @@ static __latent_entropy void rcu_core(void)
/* No grace period and unregistered callbacks? */
if (!rcu_gp_in_progress() &&
- rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
+ rcu_segcblist_is_enabled(&rdp->cblist)) {
rcu_nocb_lock_irqsave(rdp, flags);
- if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
+ if (!rcu_segcblist_completely_offloaded(&rdp->cblist) &&
+ !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
rcu_accelerate_cbs_unlocked(rnp, rdp);
rcu_nocb_unlock_irqrestore(rdp, flags);
}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 305cf6aeb408..64d615be3346 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -449,10 +449,9 @@ static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
static void __init rcu_organize_nocb_kthreads(void);
#define rcu_nocb_lock_irqsave(rdp, flags) \
do { \
+ local_irq_save(flags); \
if (!rcu_segcblist_is_offloaded(&(rdp)->cblist)) \
- local_irq_save(flags); \
- else \
- raw_spin_lock_irqsave(&(rdp)->nocb_lock, (flags)); \
+ raw_spin_lock(&(rdp)->nocb_lock); \
} while (0)
#else /* #ifdef CONFIG_RCU_NOCB_CPU */
#define rcu_nocb_lock_irqsave(rdp, flags) local_irq_save(flags)
Doing the local_irq_save() before checking that the segcblist is offloaded
protect that state from being changed (provided we lock the local rdp). Then we
can safely manipulate cblist, whether locked or unlocked.
2) The actual call to rcu_do_batch(). If we are preempted between
rcu_segcblist_completely_offloaded() and rcu_do_batch() with a deoffload in
the middle, we miss the callback invocation. Invoking rcu_core by the end of
deoffloading process should solve that.
>
> Reported-by: Valentin Schneider <valentin.schneider@xxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---
> kernel/rcu/tree.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2278,13 +2278,13 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
> {
> unsigned long flags;
> unsigned long mask;
> - bool needwake = false;
> - const bool offloaded = rcu_rdp_is_offloaded(rdp);
> + bool offloaded, needwake = false;
> struct rcu_node *rnp;
>
> WARN_ON_ONCE(rdp->cpu != smp_processor_id());
> rnp = rdp->mynode;
> raw_spin_lock_irqsave_rcu_node(rnp, flags);
> + offloaded = rcu_rdp_is_offloaded(rdp);
> if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq ||
> rdp->gpwrap) {
BTW Paul, if we happen to switch to non-NOCB (deoffload) some time after
rcu_report_qs_rdp(), it's possible that the rcu_accelerate_cbs()
that was supposed to be handled by nocb kthreads on behalf of
rcu_core() -> rcu_report_qs_rdp() would not happen. At least not until
we invoke rcu_core() again. Not sure how much harm that could cause.
> @@ -2446,7 +2446,7 @@ static void rcu_do_batch(struct rcu_data
> int div;
> bool __maybe_unused empty;
> unsigned long flags;
> - const bool offloaded = rcu_rdp_is_offloaded(rdp);
> + bool offloaded;
> struct rcu_head *rhp;
> struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
> long bl, count = 0;
> @@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data
> rcu_nocb_lock(rdp);
> WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
> pending = rcu_segcblist_n_cbs(&rdp->cblist);
> + offloaded = rcu_rdp_is_offloaded(rdp);
offloaded is also checked later in rcu_do_batch(), after irqrestore. In
fact that should even become a rcu_segcblist_completely_offloaded() check
there because if there are still pending callbacks while we are de-offloading,
rcu_core() should be invoked. Hmm but that might be a remote rcu_core...
Anyway I guess we could live with some of those races with invoking rcu core on the
target after deoffloading.
I guess I should cook a series to handle all these issues one by one, then
probably we can live without a local_lock().
Thanks.
> div = READ_ONCE(rcu_divisor);
> div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
> bl = max(rdp->blimit, pending >> div);