Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu

From: Zhouyi Zhou
Date: Sun Nov 27 2022 - 22:00:58 EST


Thank you all for your guidance and encouragement!

I learn how to construct commit message properly and learn how
important the role
that the torture test framework plays for the Linux kernel. Hope I can
be of benefit to the community by my work.

I am going to continue to study this topic and study the torture test
framework, and wait for your further instructions.

Best Regards
Zhouyi
On Mon, Nov 28, 2022 at 1:53 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Sun, Nov 27, 2022 at 01:40:28PM +0100, Thomas Gleixner wrote:
>
> [ . . . ]
>
> > >> No. We are not exporting this just to make a bogus test case happy.
> > >>
> > >> Fix the torture code to handle -EBUSY correctly.
> > > I am going to do a study on this, for now, I do a grep in the kernel tree:
> > > find . -name "*.c"|xargs grep cpuhp_setup_state|wc -l
> > > The result of the grep command shows that there are 268
> > > cpuhp_setup_state* cases.
> > > which may make our task more complicated.
> >
> > Why? The whole point of this torture thing is to stress the
> > infrastructure.
>
> Indeed.
>
> > There are quite some reasons why a CPU-hotplug or a hot-unplug operation
> > can fail, which is not a fatal problem, really.
> >
> > So if a CPU hotplug operation fails, then why can't the torture test
> > just move on and validate that the system still behaves correctly?
> >
> > That gives us more coverage than just testing the good case and giving
> > up when something unexpected happens.
>
> Agreed, with access to a function like the tick_nohz_full_timekeeper()
> suggested earlier in this email thread, then yes, it would make sense to
> try to offline the CPU anyway, then forgive the failure in cases where
> the CPU matches that indicated by tick_nohz_full_timekeeper().
>
> > I even argue that the torture test should inject random failures into
> > the hotplug state machine to achieve extended code coverage.
>
> I could imagine torture_onoff() telling various CPU-hotplug notifiers
> to refuse the transition using some TBD interface. That would better
> test the CPU-hotplug common code's ability to deal with failures.
>
> Or did you have something else/additional in mind?
>
> Thanx, Paul