Re: [PATCH v4 0/7] x86: BSP or CPU0 online/offline

From: Ingo Molnar
Date: Wed Dec 07 2011 - 02:42:32 EST



* Yu, Fenghua <fenghua.yu@xxxxxxxxx> wrote:

> > When you take it down for maintenance eventually, you don't
> > need to suspend but simply poweroff.
>
> Agree with you. To maintain a system with a bad CPU, either
> you hot plug or hot replace the CPU, or you power off then
> replace the CPU. Replacing the CPU between suspend and resume
> doesn't seem a normal RAS behavior.

More importantly, you generally *cannot* realistically continue
with a bad CPU anyway - the system will crash or will show signs
of corruptions and you *want* a full powerdown and a clean
reboot.

The usecases for real CPU hotplug look pretty limited to me:

- Special hardware environments that are deeply redundant and
can warn about 'soft' failures well before hard failures
which gives a realistic window of time for a maintenance
hot-swap. [Such hardware actually exists, i even worked with
an x86 one eons ago.]

- Swapping slower CPUs for a faster CPUs, without any downtime.
Given that mixed steppings and mixed frequencies are
generally pretty unpredictable even with no hotswap in the
picture, i can see hw designers (and qa test matrix
engineers) cringe at the idea.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/