Re: [PATCH v4 0/3] x86, apic, kexec: Add disable_cpu_apic kernelparameter

From: HATAYAMA Daisuke
Date: Sun Nov 10 2013 - 23:52:05 EST


(2013/11/07 4:02), jerry.hoemann@xxxxxx wrote:
On Wed, Oct 23, 2013 at 12:01:18AM +0900, HATAYAMA Daisuke wrote:
This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
even if 1st kernel crashs on some AP, a continueing work from:

[PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
https://lkml.org/lkml/2013/10/16/300.

In this version, basic design has changed. Now users need to figure
out initial APIC ID of BSP in the 1st kernel and configures kernel
parameter for the 2nd kernel manually using disable_cpu_apic kernel
parameter to be newly introduced in this patch set. This design is
more flexible than the previous version in that we no longer have to
rely on ACPI/MP table to get initial APIC ID of BSP.

Sorry, this patch set have not include in-source documentation
requested by Borislav Petkov yet, but I'll post it later separately,
which would be better to focus on documentation reviewing.

ChangeLog

v3 => v4)

- Rebased on top of v3.12-rc6

- Basic design has been changed. Now users need to figure out initial
APIC ID of BSP in the 1st kernel and configures kernel parameter for
the 2nd kernel manually using disable_cpu_apic kernel parameter to
be newly introduced in this patch set. This design is more flexible
than the previous version in that we no longer have to rely on
ACPI/MP table to get initial APIC ID of BSP.



Daisuke,

I have back ported version 4 of this patch to both a 2.6.32 and 3.0.80
based kernels and distros and tested on a prototype system. I have
previously test version 1 & 3 as well.)

The systems are configured to boot the capture kernel 8-way parallel.
However, I am running makedumpfile single threaded.

Panic is induced via "echo c > /proc/sysrq-trigger". This is done
under various system loads and on random cpus. I have done over a
thousand dumps total during this testing.


Thanks for your testing.

I have seen no issues w/ the 3.0.80 dump testing on our proto.

On the 2.6.32 testing on our proto, i have hit a low probability (< 5%)
chance of the capture suffering a soft lockup hang during
"Switching to clocksource hpet." I have not RCA'd this yet.
Note, I have seen this issue on earlier version of the patch, so
it is not specific to this version.

I then tested the 2.6.32 port on a dl380. This worked without issue.

Note, I have seen no issues related to this patch on our proto when
booting the capture with a single processor.

While I am still pursuing the issue of the 2.6.32 kernel on our proto,
I believe this patch is good and should be accepted.


This seems there's something that depends on the system you used. But I
have never verified my patch set on 2.6.32-based kernel. I'll try to
do a similar test on some FJ systems.

The 2.6.32-based kernel you mean is one of the Longterm release kernels,
right? So, you used on the test the 2.6.32-based Longterm release kernel
with my v4 patch, right?

The root cause seems to have already been fixed on recent kernel since
you didn't see the bug on 3.0.80-based kernel, so I think binary search
would be useful.

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/