Re: [PATCH v2] kexec: Fix kdump failure with notsc

From: Wei, Jiangang
Date: Mon Jul 11 2016 - 06:30:37 EST


Hi , Ingo

On Fri, 2016-07-08 at 09:38 +0200, Ingo Molnar wrote:
> * Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> > Sigh. Can we please just do the work to rip out the apic shutdown code from the
> > kexec on panic code path?
> >
> > I forgetting details but the only reason we have do any apic shutdown is bugs in
> > older kernels that could not initialize a system properly if we did not shut
> > down the apics.
> >
> > I certainly don't see an issue with goofy cases like notsc not working on a
> > crash capture kernel if we are not initializing the hardware properly.
> >
> > The strategy really needs to be to only do the absolutely essential hardware
> > shutdown in the crashing kernel, every adintional line of code we execute in the
> > crashing kernel increases our chances of hitting a bug.
>
> Fully agreed.
>
> > Under that policy things like requring we don't pass boot options that inhibit
> > the dump catpure kernel from initializing the hardware from a random state are
> > reasonable requirements. AKA I don't see any justification in this as to why we
> > would even want to support notsc on the dump capture kernel. Especially when
> > things clearly work when that option is not specified.
>
> So at least on the surface it appears 'surprising' that the 'notsc' option (which,
> supposedly, disables TSC handling) interferes with being able to fully boot. Even
> if 'notsc' is specified we are still using the local APIC, right?

In most case, It's no problem that using local APIC while notsc is
specified.
But not for kdump.

We can get evidence, Especially from "Spurious LAPIC timer interrupt on
cpu 0".

###serial log,

[ 0.000000] NR_IRQS:524544 nr_irqs:256 16
[ 0.000000] Spurious LAPIC timer interrupt on cpu 0
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] console [ttyS0] enabled
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 2099.947 MHz processor
[ 0.000000] Calibrating delay loop...


Due to the local apic and local apic timer hasn't been setup and enabled
fully, The event_handler of clock event is NULL.

###codes,

static void local_apic_timer_interrupt(void)
{
int cpu = smp_processor_id();
struct clock_event_device *evt = &per_cpu(lapic_events, cpu);

/*
* Normally we should not be here till LAPIC has been initialized
but
* in some cases like kdump, its possible that there is a pending
LAPIC
* timer interrupt from previous kernel's context and is delivered
in
* new kernel the moment interrupts are enabled.
*
* Interrupts are enabled early and LAPIC is setup much later, hence
* its possible that when we get here evt->event_handler is NULL.
* Check for event_handler being NULL and discard the interrupt as
* spurious.
*/
if (!evt->event_handler) {
pr_warning("Spurious LAPIC timer interrupt on cpu %d\n", cpu);
/* Switch it off */
lapic_timer_shutdown(evt);
return;
}

.............
}


IMHO,
If we specify notsc, the dump-capture kernel waits for jiffies being
updated early and LAPIC and timer are setup much later, which causes no
timer interrupts is passed to BSP. as following,

start_kernel -->
1)-> calibrate_delay() -> calibrate_delay_converge() # hang and wait
for jiffies changed

2)-> rest_init() -> kernel_init() -> .... -> apic_bsp_setup() ->
setup_local_APIC()

-> setup_percpu_clockev().

the setup_percpu_clockev points setup_boot_APIC_clock() which used to
setup the boot APIC and timer.


> So it might be a good idea to find the root cause of this bootup fragility even if
> 'notsc' is specified. And I fully agree that it should be fixed in the bootup path
> of the dump kernel, not the crash kernel reboot path.

Because the lapic and timer are not ready when dump-capture waits them
to update the jiffies value. so I suggest to put APIC in legacy mode in
local_apic_timer_interrupt() temporarily, which in the bootup path of
dump kernel.

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index dcb52850a28f..af3be93997ed 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -879,6 +879,7 @@ static void local_apic_timer_interrupt(void)
pr_warning("Spurious LAPIC timer interrupt on cpu %d\n",
cpu);
/* Switch it off */
lapic_timer_shutdown(evt);
+ disable_IO_APIC();
return;
}

And the new solution can fix the problem.
Whatâs your opinion about it?

Thanks,
wei

>
> Thanks,
>
> Ingo
>
>