Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

From: Ingo Molnar
Date: Wed Apr 17 2013 - 03:47:08 EST



* Robin Holt <holt@xxxxxxx> wrote:

> > reboot_cpu_id = cpumask_first(cpu_online_mask);
> >
> > > Also, does this codepath prevent hotplug from going on in parallel?
> >
> > Not sure. I have not considered hotplug. I will look that over when I
> > am in the office.
>
> OK. I have been mulling this over for a bit and I don't think I
> understand what you are asking.

Well, I just saw the apparently naked use of cpu_online_mask, and asked myself
whether that's safe against hotplug.

Upstream we had two methods:

- historical: just reboot on any random CPU we happen to run on
- current: offline all nonboot CPUs then reboot on the boot CPU

Both methods were implicitly "CPU hotplug safe", no locking needed, because either
they didn't need any, or because it used disable_nonboot_cpus() which is a hotplug
safe method.

Now your patches change this to:

- migrate to CPU#0 [if possible] and reboot there

Given that on a system CPU-hotplugging might be executing on any given CPU, if
reboot is running on another you have to consider the interactions. The previous
historic and current upstream method was reasonably hotplug safe - yours I'm not
sure about, there's just no hotplug locking in it, etc?

> I would expect that if an architecture depends upon a certain cpu for
> shutdown/reboot/halt/suspend/hibernate and that support has been compiled in,
> then the arch should be preventing that cpu from being removed. I do not know
> how that would work and think that is far beyond the scope of the initial
> problem I have been trying to solve. If that is your question, I certainly do
> not know how to address it. I get the feeling this is off your mark due to the
> "parallel" wording above.

What you mention here should indeed already be handled by the architecture hotplug
code (for example on x86 the boot CPU cannot be hot-removed).

But beyond that, your use of cpu_online_map is AFAICS not hotplug-safe. For
example a possible race would be that another CPU might be not-unplugging a CPU
and you try to reboot-migrate to it.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/