Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection

From: Vincent Donnefort
Date: Wed Jan 20 2021 - 10:19:42 EST


On Wed, Jan 20, 2021 at 01:58:35PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@xxxxxxx wrote:
> > From: Vincent Donnefort <vincent.donnefort@xxxxxxx>
> >
> > The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
> > triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
> > of the atomic can. Hence, rollback is not possible after a hotunplug
> > CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
> > it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
> > (finish_cpu()) cannot fail anyway.
> >
> > Signed-off-by: Vincent Donnefort <vincent.donnefort@xxxxxxx>
> > ---
> > kernel/cpu.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 9121edf..bcd7b2a 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
> > return -EINVAL;
> >
> > /*
> > - * Cannot fail STARTING/DYING callbacks.
> > + * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
> > + * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
> > + * can't fail during hotunplug, as it would mean we have no way of
> > + * rolling back the atomic states that have been previously teared
> > + * down.
> > */
> > - if (cpuhp_is_atomic_state(fail))
> > + if (cpuhp_is_atomic_state(fail) ||
> > + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
> > return -EINVAL;
>
> Should we instead disallow failing any state that has .cant_stop ?

We would reduce the scope of what can be tested: bringup_cpu() and
takedown_cpu() are both marked as "cant_stop". Still, those callbacks are
allowed to fail.

Checking for cant_stop, made me also see that write_cpuhp_target() is probably
missing a check for cpuhp_is_atomic_state(). For the same reason as this patch,
when doing cpu_down(), we can't stop in one of these states.

--
Vincent