Re: [v3,1/2] rcutorture: Perform more frequent testing of ->gpwrap
From: Paul E. McKenney
Date: Mon Apr 14 2025 - 12:18:02 EST
On Mon, Apr 14, 2025 at 10:56:24AM -0400, Joel Fernandes wrote:
> On 4/14/2025 10:24 AM, Paul E. McKenney wrote:
> > On Mon, Apr 14, 2025 at 08:07:24AM -0400, Joel Fernandes wrote:
> >>> On Apr 11, 2025, at 3:18 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >>> On Fri, Apr 11, 2025 at 05:36:32AM -0000, Joel Fernandes wrote:
> >>>> Hello, Paul,
> >>>>
> >>>>> On Fri, 11 Apr 2025 05:33:16 GMT, "Paul E. McKenney" wrote:
> >>>>> On Thu, Apr 10, 2025 at 11:54:13AM -0700, Paul E. McKenney wrote:
> >>>>>> On Thu, Apr 10, 2025 at 11:29:03AM -0700, Paul E. McKenney wrote:
> >>>>>>> On Thu, Apr 10, 2025 at 11:03:27AM -0400, Joel Fernandes wrote: >
> >>>>>>> Currently, the ->gpwrap is not tested (at all per my testing) due to
> >>>>>>> the > requirement of a large delta between a CPU's rdp->gp_seq and its
> >>>>>>> node's > rnp->gpseq. > > This results in no testing of ->gpwrap being
> >>>>>>> set. This patch by default > adds 5 minutes of testing with ->gpwrap
> >>>>>>> forced by lowering the delta > between rdp->gp_seq and rnp->gp_seq to
> >>>>>>> just 8 GPs. All of this is > configurable, including the active time for
> >>>>>>> the setting and a full > testing cycle. > > By default, the first 25
> >>>>>>> minutes of a test will have the _default_ > behavior there is right now
> >>>>>>> (ULONG_MAX / 4) delta. Then for 5 minutes, > we switch to a smaller delt
> >>>>> a
> >>>>>>> causing 1-2 wraps in 5 minutes. I believe > this is reasonable since we
> >>>>>>> at least add a little bit of testing for > usecases where ->gpwrap is se
> >>>>> t.
> >>>>>>>>> Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx>
> >>>>>>>
> >>>>>>> Much better, thank you!
> >>>>>>>
> >>>>>>> One potential nit below. I will run some tests on this version.
> >>>>>>
> >>>>>> And please feel free to apply the following to both:
> >>>>>>
> >>>>>> Tested-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> >>>>>
> >>>>> And this happy situation lasted only until I rebased onto v6.15-rc1 and
> >>>>> on top of this commit:
> >>>>>
> >>>>> 1342aec2e442 ("Merge branches 'rcu/misc-for-6.16', 'rcu/seq-counters-for-6.1
> >>>>> 6' and 'rcu/torture-for-6.16' into rcu/for-next")
> >>>>>
> >>>>> This got me the splat shown below when running rcutorture scenario SRCU-N.
> >>>>> I reverted this commit and tests pass normally.
> >>>>>
> >>>>> Your other commit (ARM64 images) continues working fine.
> >>>>
> >>>> Interesting.. it seems to be crashing during statistics printing.
> >>>>
> >>>> I am wondering if the test itself uncovered a bug or the bug is in the test
> >>>> itself.
> >>>
> >>> Both are quite possible, also a bug somewhere else entirely.
> >>
> >> I may not get to debugging it for this merge window so I am leaning to defer it.
> >
> > The usual cause is use of an rcu_torture_ops function pointer without
> > having first checked that it is non-NULL. But I suspect that you already
> > checked for this.
>
> Oops, I am not! You are right I think it broke since the movement into ops and
> needs this which I missed for this call site (though I did it for the other). I
> could repro SRCU-N without it and with the fix it passes:
>
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 74de92c3a9ab..df6504a855aa 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -2412,7 +2412,8 @@ rcu_torture_stats_print(void)
> pipesummary[i] += READ_ONCE(per_cpu(rcu_torture_count,
> cpu)[i]);
> batchsummary[i] += READ_ONCE(per_cpu(rcu_torture_batch,
> cpu)[i]);
> }
> - n_gpwraps += cur_ops->get_gpwrap_count(cpu);
> + if (cur_ops->get_gpwrap_count)
> + n_gpwraps += cur_ops->get_gpwrap_count(cpu);
> }
> for (i = RCU_TORTURE_PIPE_LEN; i >= 0; i--) {
> if (pipesummary[i] != 0)
>
> I will squash the fix into the patch and repost as v4.
Been there, done that! And looks like a proper fix to me.
Thanx, Paul
> >>>> Looking forward to your test with the other patch and we could hold off on this
> >>>> one till we have more data about what is going on.
> >>>
> >>> This one got lot of OOMs when tests of RCU priority boosting overlapped
> >>> with testing of RCU callback flooding on TREE03, as in 13 of the 14
> >>> 9-hour runs. Back on v6.14-rc1, these were quite rare.
> >>>
> >>> Ah, and I am carrying this as an experimental patch:
> >>>
> >>> 269b9b5be09d ("EXP sched: Disable DL server if sysctl_sched_rt_runtime is -1")
> >>>
> >>> Just checking to see if this is still something I should be carrying.
> >>
> >> I think since it exposing boost issues, we should carry it! However since it is also noisy, maybe for short term we not carry it in any trees since we are getting close to posting the topic branches.
> >
> > I am carrying it in -rcu, but marked "EXP" so that I don't post it or
> > send it along in a pull request.
>
> Sounds good to me.
>
> >> Do you see the same boost issues or frequency of them when carrying it on 6.15-rc1 without any of this merge windows changes?
> >>
> >> By the way I have to rewrite that EXP patch at some point based on a review of it but functionally that patch is good.
> >
> > I just now started a short test with it reverted.
> >
> > Oh, and yours and Boqun's latest passed overnight tests except for a
> > few Kconfig issues including the PREEMPT_RT pair:
> >
> > 75cf58ef310a ("Merge branches 'rcu/misc-for-6.16', 'rcu/seq-counters-for-6.16' and 'rcu/torture-for-6.16' into rcu/for-next")
> >
> > This is a known Kconfig issue in torture.sh, fixed by this -rcu commit:
> >
> > 2e26af16b7b6 ("torture.sh: Force CONFIG_RCU_NOCB_CPU=y for --do-rt configurations")
>
> I already merged this change
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=rcu/for-next&id=a3204f778cf7e37c7344404768398b4f9d43a368
>
> But you saw issues in your testing even with this?
>
> Could you rebase on top of my for-next branch so we are on the same page? Tag
> next.2025.04.11a
>
> I believe you said you were going to rebuild your tree, but were waiting on testing?
>
> >
> > There are also Kconfig issues with a few of the KCSAN rcutorture scenarios
> > that I am looking into. And torture.sh needs to be more aggressive about
> > reporting these...
> Ok sounds good, happy to add these to my torture-for-6.16 topic branch once you
> post them.
>
> thanks,
>
> - Joel
>
>