Re: [PATCHv2] kexec: disable cpu hotplug until the rebooting cpu is stable

From: Pingfan Liu
Date: Fri Jan 28 2022 - 02:41:47 EST


On Thu, Jan 27, 2022 at 05:41:44PM +0800, Baoquan He wrote:
Hi Baoquan,

Thanks for reviewing, please see comment inlined
> Hi Pingfan,
>
> On 01/27/22 at 05:02pm, Pingfan Liu wrote:
> > The following identical code piece appears in both
> > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
> >
> > if (!cpu_online(primary_cpu))
> > primary_cpu = cpumask_first(cpu_online_mask);
> >
> > This is due to a breakage like the following:
> > migrate_to_reboot_cpu();
> > cpu_hotplug_enable();
> > --> comes a cpu_down(this_cpu) on other cpu
> > machine_shutdown();
> >
> > Although the kexec-reboot task can get through a cpu_down() on its cpu,
> > this code looks a little confusing.
> >
> > Make things straight forward by keeping cpu hotplug disabled until
> > smp_shutdown_nonboot_cpus() holds cpu_add_remove_lock. By this way, the
> > breakage is squashed out and the rebooting cpu can keep unchanged.
>
> If I didn't go through code wrongly, you may miss the x86 case.
> Several ARCHes do call smp_shutdown_nonboot_cpus() in machine_shutdown()
> in kexec reboot code path, while x86 doesn't. If I am right, you may
> need reconsider if this patch is needed or need be adjustd.
>
Citing the code piece in kernel_kexec()

migrate_to_reboot_cpu();

/*
* migrate_to_reboot_cpu() disables CPU hotplug assuming that
* no further code needs to use CPU hotplug (which is true in
* the reboot case). However, the kexec path depends on using
* CPU hotplug again; so re-enable it here.
*/
cpu_hotplug_enable();
pr_notice("Starting new kernel\n");
machine_shutdown();

So maybe it can be considered in such way: "cpu_hotplug_enable()" is not
needed by x86 and ppc, so this patch removes it, while re-displace it in
a more appropriate place for arm64/riscv ...

> Are you optimizing code path, or you meet a real problem? I haven't
> checked v1, but I also didn't see it's told in patch log which case it
> is.
>
Simplify the code path and make the logic look straight forward.

And sorry for bad expression. I had thought I expressed it by (citing
git log)

|| The following identical code piece appears in both
^^^^^^^^
|| migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
||
|| if (!cpu_online(primary_cpu))
|| primary_cpu = cpumask_first(cpu_online_mask);
||
|| This is due to a breakage like the following:
^^^^^^^^
|| migrate_to_reboot_cpu();
|| cpu_hotplug_enable();
|| --> comes a cpu_down(this_cpu) on other cpu
|| machine_shutdown();
||
|| Although the kexec-reboot task can get through a cpu_down() on its cpu,
^^^^^^^^^^^
|| this code looks a little confusing.

Should I rephrase it?

Thanks,

Pingfan

> >
> > Note: this patch only affects the kexec-reboot on arches, which rely on
> > cpu hotplug mechanism.
> >
> > Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx>
> > Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Cc: Valentin Schneider <valentin.schneider@xxxxxxx>
> > Cc: Vincent Donnefort <vincent.donnefort@xxxxxxx>
> > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > Cc: Mark Rutland <mark.rutland@xxxxxxx>
> > Cc: YueHaibing <yuehaibing@xxxxxxxxxx>
> > Cc: Baokun Li <libaokun1@xxxxxxxxxx>
> > Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
> > Cc: Valentin Schneider <valentin.schneider@xxxxxxx>
> > Cc: kexec@xxxxxxxxxxxxxxxxxxx
> > To: linux-kernel@xxxxxxxxxxxxxxx
> > ---
> > v1 -> v2:
> > improve commit log
> >
> > kernel/cpu.c | 16 ++++++++++------
> > kernel/kexec_core.c | 10 ++++------
> > 2 files changed, 14 insertions(+), 12 deletions(-)
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 9c92147f0812..87bdf21de950 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -1240,20 +1240,24 @@ int remove_cpu(unsigned int cpu)
> > }
> > EXPORT_SYMBOL_GPL(remove_cpu);
> >
> > +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */
> > void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
> > {
> > unsigned int cpu;
> > int error;
> >
> > + /*
> > + * Block other cpu hotplug event, so primary_cpu is always online if
> > + * it is not touched by us
> > + */
> > cpu_maps_update_begin();
> > -
> > /*
> > - * Make certain the cpu I'm about to reboot on is online.
> > - *
> > - * This is inline to what migrate_to_reboot_cpu() already do.
> > + * migrate_to_reboot_cpu() disables CPU hotplug assuming that
> > + * no further code needs to use CPU hotplug (which is true in
> > + * the reboot case). However, the kexec path depends on using
> > + * CPU hotplug again; so re-enable it here.
> > */
> > - if (!cpu_online(primary_cpu))
> > - primary_cpu = cpumask_first(cpu_online_mask);
> > + __cpu_hotplug_enable();
> >
> > for_each_online_cpu(cpu) {
> > if (cpu == primary_cpu)
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index 68480f731192..db4fa6b174e3 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -1168,14 +1168,12 @@ int kernel_kexec(void)
> > kexec_in_progress = true;
> > kernel_restart_prepare("kexec reboot");
> > migrate_to_reboot_cpu();
> > -
> > /*
> > - * migrate_to_reboot_cpu() disables CPU hotplug assuming that
> > - * no further code needs to use CPU hotplug (which is true in
> > - * the reboot case). However, the kexec path depends on using
> > - * CPU hotplug again; so re-enable it here.
> > + * migrate_to_reboot_cpu() disables CPU hotplug. If an arch
> > + * relies on the cpu teardown to achieve reboot, it needs to
> > + * re-enable CPU hotplug there.
> > */
> > - cpu_hotplug_enable();
> > +
> > pr_notice("Starting new kernel\n");
> > machine_shutdown();
> > }
> > --
> > 2.31.1
> >
>