Re: [BUG] rcu-tasks : should take care of sparse cpu masks

From: Eric Dumazet
Date: Mon Apr 04 2022 - 19:07:09 EST


On Mon, Apr 4, 2022 at 12:49 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Thu, Mar 31, 2022 at 03:54:02PM -0700, Eric Dumazet wrote:
> > On Thu, Mar 31, 2022 at 3:42 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Mar 31, 2022 at 02:45:25PM -0700, Eric Dumazet wrote:
> > > > Hi Paul
> > > >
> > > > It seems you assume per cpu ptr for arbitrary indexes (< nr_cpu_ids) are valid.
> > >
> > > Gah! I knew I was forgetting something...
> > >
> > > But just to check, is this a theoretical problem or something you hit
> > > on real hardware? (For the rest of this email, I am assuming the latter.)
> >
> > Code review really...
> >
> > >
> > > > What do you think of the (untested) following patch ?
> > >
> > > One issue with this patch is that the contention could be unpredictable,
> > > or worse, vary among CPU, especially if the cpu_possible_mask was oddly
> > > distributed.
> > >
> > > So might it be better to restrict this to all on CPU 0 on the one hand
> > > and completely per-CPU on the other? (Or all on the boot CPU, in case
> > > I am forgetting some misbegotten architecture that can run without a
> > > CPU 0.)
> >
> > If I understand correctly, cblist_init_generic() could setup
> > percpu_enqueue_shift
> > to something smaller than order_base_2(nr_cpu_ids)
> >
> > Meaning that we could reach a non zero idx in (smp_processor_id() >>
> > percpu_enqueue_shift)
> >
> > So even if CPU0 is always present (I am not sure this is guaranteed,
> > but this seems reasonable),
> > we could still attempt a per_cpu_ptr(PTR, not_present_cpu), and get garbage.
>
> And the problem with my wish to provide load balancing is that a
> sparse cpumask could be sparse any which way that it wants to be.
> Another problem is that, unlike TREE SRCU, Tasks RCU doesn't have an
> efficient way to find all the CPUs with callbacks queued. Yes, I could
> add that information, but the benefit does not seem worth the complexity.
>
> So I took your patch after all, but changed from cpu_online_mask to
> cpu_possible_mask. Thank you for bearing with me on this one!
>
> Are you OK with your Signed-off-by on this patch as shown below?

Absolutely, thanks Paul for taking care of this.

>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit b77b2981bb22c4449a0a6e86eeb9fbab36a2beae
> Author: Eric Dumazet <edumazet@xxxxxxxxxx>
> Date: Mon Apr 4 12:30:18 2022 -0700
>
> rcu-tasks: Handle sparse cpu_possible_mask
>
> If the rcupdate.rcu_task_enqueue_lim kernel boot parameter is set to
> something greater than 1 and less than nr_cpu_ids, the code attempts to
> use a subset of the CPU's RCU Tasks callback lists. This works, but only
> if the cpu_possible_mask is contiguous. If there are "holes" in this
> mask, the callback-enqueue code might attempt to access a non-existent
> per-CPU ->rtcpu variable for a non-existent CPU. For example, if only
> CPUs 0, 4, 8, 12, 16 and so on are in cpu_possible_mask, specifying
> rcupdate.rcu_task_enqueue_lim=4 would cause the code to attempt to
> use callback queues for non-existent CPUs 1, 2, and 3. Because such
> systems have existed in the past and might still exist, the code needs
> to gracefully handle this situation.
>
> This commit therefore checks to see whether the desired CPU is present
> in cpu_possible_mask, and, if not, searches for the next CPU. This means
> that the systems administrator of a system with a sparse cpu_possible_mask
> will need to account for this sparsity when specifying the value of
> the rcupdate.rcu_task_enqueue_lim kernel boot parameter. For example,
> setting this parameter to the value 4 will use only CPUs 0 and 4, which
> CPU 4 getting three times the callback load of CPU 0.
>
> This commit assumes that bit (nr_cpu_ids - 1) is always set in
> cpu_possible_mask.
>
> Link: https://lore.kernel.org/lkml/CANn89iKaNEwyNZ=L_PQnkH0LP_XjLYrr_dpyRKNNoDJaWKdrmg@xxxxxxxxxxxxxx/
> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index 65d6e21a607a..44977c6a1cb8 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -273,7 +273,9 @@ static void call_rcu_tasks_iw_wakeup(struct irq_work *iwp)
> static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
> struct rcu_tasks *rtp)
> {
> + int chosen_cpu;
> unsigned long flags;
> + int ideal_cpu;
> unsigned long j;
> bool needadjust = false;
> bool needwake;
> @@ -283,8 +285,9 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
> rhp->func = func;
> local_irq_save(flags);
> rcu_read_lock();
> - rtpcp = per_cpu_ptr(rtp->rtpcpu,
> - smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift));
> + ideal_cpu = smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift);
> + chosen_cpu = cpumask_next(ideal_cpu - 1, cpu_possible_mask);
> + rtpcp = per_cpu_ptr(rtp->rtpcpu, chosen_cpu);
> if (!raw_spin_trylock_rcu_node(rtpcp)) { // irqs already disabled.
> raw_spin_lock_rcu_node(rtpcp); // irqs already disabled.
> j = jiffies;