Re: [BUG] rcu-tasks : should take care of sparse cpu masks
From: Paul E. McKenney
Date: Mon Apr 04 2022 - 17:51:11 EST
On Thu, Mar 31, 2022 at 03:54:02PM -0700, Eric Dumazet wrote:
> On Thu, Mar 31, 2022 at 3:42 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Thu, Mar 31, 2022 at 02:45:25PM -0700, Eric Dumazet wrote:
> > > Hi Paul
> > >
> > > It seems you assume per cpu ptr for arbitrary indexes (< nr_cpu_ids) are valid.
> >
> > Gah! I knew I was forgetting something...
> >
> > But just to check, is this a theoretical problem or something you hit
> > on real hardware? (For the rest of this email, I am assuming the latter.)
>
> Code review really...
>
> >
> > > What do you think of the (untested) following patch ?
> >
> > One issue with this patch is that the contention could be unpredictable,
> > or worse, vary among CPU, especially if the cpu_possible_mask was oddly
> > distributed.
> >
> > So might it be better to restrict this to all on CPU 0 on the one hand
> > and completely per-CPU on the other? (Or all on the boot CPU, in case
> > I am forgetting some misbegotten architecture that can run without a
> > CPU 0.)
>
> If I understand correctly, cblist_init_generic() could setup
> percpu_enqueue_shift
> to something smaller than order_base_2(nr_cpu_ids)
>
> Meaning that we could reach a non zero idx in (smp_processor_id() >>
> percpu_enqueue_shift)
>
> So even if CPU0 is always present (I am not sure this is guaranteed,
> but this seems reasonable),
> we could still attempt a per_cpu_ptr(PTR, not_present_cpu), and get garbage.
And the problem with my wish to provide load balancing is that a
sparse cpumask could be sparse any which way that it wants to be.
Another problem is that, unlike TREE SRCU, Tasks RCU doesn't have an
efficient way to find all the CPUs with callbacks queued. Yes, I could
add that information, but the benefit does not seem worth the complexity.
So I took your patch after all, but changed from cpu_online_mask to
cpu_possible_mask. Thank you for bearing with me on this one!
Are you OK with your Signed-off-by on this patch as shown below?
Thanx, Paul
------------------------------------------------------------------------
commit b77b2981bb22c4449a0a6e86eeb9fbab36a2beae
Author: Eric Dumazet <edumazet@xxxxxxxxxx>
Date: Mon Apr 4 12:30:18 2022 -0700
rcu-tasks: Handle sparse cpu_possible_mask
If the rcupdate.rcu_task_enqueue_lim kernel boot parameter is set to
something greater than 1 and less than nr_cpu_ids, the code attempts to
use a subset of the CPU's RCU Tasks callback lists. This works, but only
if the cpu_possible_mask is contiguous. If there are "holes" in this
mask, the callback-enqueue code might attempt to access a non-existent
per-CPU ->rtcpu variable for a non-existent CPU. For example, if only
CPUs 0, 4, 8, 12, 16 and so on are in cpu_possible_mask, specifying
rcupdate.rcu_task_enqueue_lim=4 would cause the code to attempt to
use callback queues for non-existent CPUs 1, 2, and 3. Because such
systems have existed in the past and might still exist, the code needs
to gracefully handle this situation.
This commit therefore checks to see whether the desired CPU is present
in cpu_possible_mask, and, if not, searches for the next CPU. This means
that the systems administrator of a system with a sparse cpu_possible_mask
will need to account for this sparsity when specifying the value of
the rcupdate.rcu_task_enqueue_lim kernel boot parameter. For example,
setting this parameter to the value 4 will use only CPUs 0 and 4, which
CPU 4 getting three times the callback load of CPU 0.
This commit assumes that bit (nr_cpu_ids - 1) is always set in
cpu_possible_mask.
Link: https://lore.kernel.org/lkml/CANn89iKaNEwyNZ=L_PQnkH0LP_XjLYrr_dpyRKNNoDJaWKdrmg@xxxxxxxxxxxxxx/
Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 65d6e21a607a..44977c6a1cb8 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -273,7 +273,9 @@ static void call_rcu_tasks_iw_wakeup(struct irq_work *iwp)
static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
struct rcu_tasks *rtp)
{
+ int chosen_cpu;
unsigned long flags;
+ int ideal_cpu;
unsigned long j;
bool needadjust = false;
bool needwake;
@@ -283,8 +285,9 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
rhp->func = func;
local_irq_save(flags);
rcu_read_lock();
- rtpcp = per_cpu_ptr(rtp->rtpcpu,
- smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift));
+ ideal_cpu = smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift);
+ chosen_cpu = cpumask_next(ideal_cpu - 1, cpu_possible_mask);
+ rtpcp = per_cpu_ptr(rtp->rtpcpu, chosen_cpu);
if (!raw_spin_trylock_rcu_node(rtpcp)) { // irqs already disabled.
raw_spin_lock_rcu_node(rtpcp); // irqs already disabled.
j = jiffies;