Re: [PATCH] rcu: fix an infinite loop in rcu_gp_cleanup()

From: Qian Cai
Date: Mon Dec 16 2019 - 09:13:03 EST




> On Dec 15, 2019, at 3:16 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Sun, Dec 15, 2019 at 01:52:42AM -0500, Qian Cai wrote:
>> The commit 82150cb53dcb ("rcu: React to callback overload by
>> aggressively seeking quiescent states") introduced an infinite loop
>> during boot here,
>>
>> // Reset overload indication for CPUs no longer overloaded
>> for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
>> rdp = per_cpu_ptr(&rcu_data, cpu);
>> check_cb_ovld_locked(rdp, rnp);
>> }
>>
>> because on an affected machine,
>>
>> rnp->cbovldmask = 0
>> rnp->grphi = 127
>> rnp->grplo = 0
>
> Your powerpc.config file from your first email shows:
>
> rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=128
>
> So this should be the root rcu_node structure (as opposed to one of the
> eight leaf rcu_node structures, each of which should have the difference
> between rnp->grphi and rnp->grplo equal to 15). And RCU should not be
> invoking for_each_leaf_node_cpu_mask() on this rcu_node structure.
>
>> It ends up with "cpu" is always 64 and never be able to get out of the
>> loop due to "cpu <= rnp->grphi". It is pointless to enter the loop when
>> the cpumask is 0 as there is no CPU would be able to match it.
>>
>> Fixes: 82150cb53dcb ("rcu: React to callback overload by aggressively seeking quiescent states")
>> Signed-off-by: Qian Cai <cai@xxxxxx>
>> ---
>> kernel/rcu/rcu.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
>> index ab504fbc76ca..fb691ec86df4 100644
>> --- a/kernel/rcu/rcu.h
>> +++ b/kernel/rcu/rcu.h
>> @@ -363,7 +363,7 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
>> ((rnp)->grplo + find_next_bit(&(mask), BITS_PER_LONG, (cpu)))
>> #define for_each_leaf_node_cpu_mask(rnp, cpu, mask) \
>> for ((cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
>> - (cpu) <= rnp->grphi; \
>> + (cpu) <= rnp->grphi && (mask); \
>> (cpu) = rcu_find_next_bit((rnp), (cpu) + 1 - (rnp->grplo), (mask)))
>
> This change cannot be right, but has to be one of the better bug reports
> I have ever received, so thank you very much, greatly appreciated!!!
>
> So if I say the above change cannot be right, what change might work?
>
> Again, it would certainly be a bug to invoke for_each_leaf_node_cpu_mask()
> on anything but one of the leaf rcu_node structures, as you might guess
> from the name. And as noted above, your dump of the rcu_node fields
> above looks like it is exactly that bug that happened. So let's review
> the uses of this macro:
>
> kernel/rcu/tree.c:
>
> o rcu_gp_cleanup() invokes for_each_leaf_node_cpu_mask() within a
> srcu_for_each_node_breadth_first() loop, which includes non-leaf
> rcu_node structures. This is a bug! Please see patch below.
>
> o force_qs_rnp() is OK because for_each_leaf_node_cpu_mask() is
> invoked within a rcu_for_each_leaf_node() loop.
>
> kernel/rcu/tree_exp.h:
>
> o rcu_report_exp_cpu_mult() invokes it on whatever rcu_node structure
> was passed in:
>
> o sync_rcu_exp_select_node_cpus() also relies on its
> caller (via workqueues) to do the right thing.
>
> o sync_rcu_exp_select_cpus() invokes
> sync_rcu_exp_select_node_cpus() from within a
> rcu_for_each_leaf_node() loop, so is OK.
>
> o sync_rcu_exp_select_cpus() also invokes
> sync_rcu_exp_select_node_cpus() indirectly
> via workqueues, but also from within the same
> rcu_for_each_leaf_node() loop, so is OK.
>
> o rcu_report_exp_rdp() invokes rcu_report_exp_cpu_mult()
> on the rcu_node structure corresponding to some
> specific CPU, which will always be a leaf rcu_node
> structure.
>
> Again, thank you very much for your testing and debugging efforts!
> I have queued the three (almost untested) patches below, the first of
> which I will fold into the buggy "rcu: React to callback overload by
> aggressively seeking quiescent states" patch, the second of which I will
> apply to prevent future bugs of this sort, even when running on small
> systems, and the third of which will allow me to easily run rcutorture
> tests on the larger systems that I have recently gained easy access to.
>
> And the big question... Does the first patch clear up your hangs? ;-)
> Either way, please let me know!
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit e8d6182b015bdd8221164477f4ab1c307bd2fbe9
> Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Date: Sun Dec 15 10:59:06 2019 -0800
>
> squash! rcu: React to callback overload by aggressively seeking quiescent states
>
> [ paulmck: Fix bug located by Qian Cai. ]
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1d0935e..48fba22 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1760,10 +1760,11 @@ static void rcu_gp_cleanup(void)
> /* smp_mb() provided by prior unlock-lock pair. */
> needgp = rcu_future_gp_cleanup(rnp) || needgp;
> // Reset overload indication for CPUs no longer overloaded
> - for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
> - rdp = per_cpu_ptr(&rcu_data, cpu);
> - check_cb_ovld_locked(rdp, rnp);
> - }
> + if (rcu_is_leaf_node(rnp))
> + for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
> + rdp = per_cpu_ptr(&rcu_data, cpu);
> + check_cb_ovld_locked(rdp, rnp);
> + }
> sq = rcu_nocb_gp_get(rnp);
> raw_spin_unlock_irq_rcu_node(rnp);
> rcu_nocb_gp_cleanup(sq);
>


This works fine.

Tested-by: Qian Cai <cai@xxxxxx>