Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu

From: Thomas Gleixner
Date: Sun Nov 27 2022 - 07:40:35 EST


Zhouyi,

On Sun, Nov 27 2022 at 10:45, Zhouyi Zhou wrote:
> On Sun, Nov 27, 2022 at 1:05 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> So, I should construct my patch as:
> We avoid ... by ...

Not "We avoid".

Avoid this behaviour by ....

>> No. We are not exporting this just to make a bogus test case happy.
>>
>> Fix the torture code to handle -EBUSY correctly.
> I am going to do a study on this, for now, I do a grep in the kernel tree:
> find . -name "*.c"|xargs grep cpuhp_setup_state|wc -l
> The result of the grep command shows that there are 268
> cpuhp_setup_state* cases.
> which may make our task more complicated.

Why? The whole point of this torture thing is to stress the
infrastructure.

There are quite some reasons why a CPU-hotplug or a hot-unplug operation
can fail, which is not a fatal problem, really.

So if a CPU hotplug operation fails, then why can't the torture test
just move on and validate that the system still behaves correctly?

That gives us more coverage than just testing the good case and giving
up when something unexpected happens.

I even argue that the torture test should inject random failures into
the hotplug state machine to achieve extended code coverage.

Thanks,

tglx