Re: [PATCH v2 2/2] x86, apic: Disable BSP if boot cpu is AP

From: HATAYAMA Daisuke
Date: Tue Oct 22 2013 - 07:04:28 EST


(2013/10/19 2:36), Vivek Goyal wrote:
On Wed, Oct 16, 2013 at 10:26:44AM +0900, HATAYAMA Daisuke wrote:

[..]
I am wondering if there is any attribute of cpu which we can pass to
second kernel on command line. And tell second kernel not to bring up
that specific cpu. (Say exclude_cpu=<cpu_attr>)? If this works, then
if ACPI or other mechanism don't report BSP, we could possibly assume
that cpu 0 is BSP and ask second kernel to not try to boot it.


I've come up with similar idea. If there's such kernel option, rest of
the processing can be implemented in user-land, i.e., get apicid of
BSP from /proc/cpuid and set it in kernel command line of 2nd kernel.
What kexec-tools should do on fedora/RHEL? Also, this idea covers SFI
and device tree.

The reason why I didn't choose such idea was first passing the value
via command-line seems rather ad-hoc.

We do so many things using command line. So telling kernel not to boot
certain cpus seems ok to me.

The second reason is that in any
case it's compromised design. Rigorously, we cannot get correct mapping
of apicid to {BSP, APIC} at the 1st kernel. That is, there's a class of
the bugs that affect BSP flag of each processor. For example, on
catastrophic state, all the cpus can have BSP flag on the 2nd kernel due
to wrmsr instructions generated by the bug causing crash. In this sense,
current implementation is less reliable than max_cpus=1 case.

If addressing this rigorously, for example, we need to check status of
BSP flag between 1st kernel and 2nd kernel to keep processor with BSP
flag unique, exclude cpus in catastrophic state that are not checked,
and to tell the 2nd kernel which cpu can be wake up.

Ok, for the time being let us not do any comparision with maxcpus=1 or
nr_cpus=1 because we know that's the most robust thing to do.

For the case where we want to bring up more than one cpu in second kernel,
there seems to be two problems.

- ACPI tables or other tables might not report which is BSP. In that
case we might try to bring up BSP and crash the system.

- Due to malicious wrmsr, more than one cpu might claim being BSP. In that
case the cpu we are crashing on will think it is BSP and it can safely
bring up other cpus.

If we start sending a mask of cpus which should not be brought up in
second kernel, then it would not matter whether BSP flag in MSR is set
or not. Isn't it? And that will solve the second issue.


No. As long as the mask is created in the 1st kernel, mapping between CPUs
and {BSP, AP} could get changed at crash. So, the ``mask'' idea never
improves reliability.

To obtain complete reliability without any hardware support to get mapping
between all the CPUS and {BSP, AP}, we must create such mask after crash,
i.e., between the 1st and 2nd kernel such as purgatory or other new phase.
The idea is, for example, that let crashing AP wait for other CPUs in purgatory
until specified number of CPUs reach there or until a certain limit time passes
in case no other CPUs reach there in catastrophic state, and let even the other
CPUs except for the crashing AP go into purgatory, not halt just as the current
implementation, to let them check the mask to represent they can be safely
woken up in the 2nd kernel and then let them halt in the purgatory until they
or part of them are woken up from the 2nd kernel.

And if ACPI tables don't report which one is BSP, user space can first
try to look at BSP flags of processors (may be this can be reported
in /proc/cpuinfo?) and if no one has BSP flag set, then assume cpu 0
is BSP.

So to me it looks like passing which cpus to not bring up to second kernel
is more resilient approach. Isn't it?


Yes. Though reliability is similar to the current approach, user-space approach
is better in that it doesn't depend on what kind of BIOS tables are present
in the system. Also, the idea is more general and could be applied to other
purposes, I don't know exactly what it is; disabling some part of CPUs might
be useful for the purpose of some kind of debugging?

I'll post new version later.

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/