Re: [PATCH v2] kexec: Fix kdump failure with notsc

From: Baoquan He
Date: Wed Jul 13 2016 - 05:06:16 EST


On 07/13/16 at 07:46am, Wei, Jiangang wrote:
> On Mon, 2016-07-11 at 18:28 +0800, Wei Jiangang wrote:
> > Hi , Ingo
> >
> > On Fri, 2016-07-08 at 09:38 +0200, Ingo Molnar wrote:
> > > * Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> > >
> > > > Sigh. Can we please just do the work to rip out the apic shutdown code from the
> > > > kexec on panic code path?
> > > >
> > > > I forgetting details but the only reason we have do any apic shutdown is bugs in
> > > > older kernels that could not initialize a system properly if we did not shut
> > > > down the apics.
> > > >
> > > > I certainly don't see an issue with goofy cases like notsc not working on a
> > > > crash capture kernel if we are not initializing the hardware properly.
> > > >
> > > > The strategy really needs to be to only do the absolutely essential hardware
> > > > shutdown in the crashing kernel, every adintional line of code we execute in the
> > > > crashing kernel increases our chances of hitting a bug.
> > >
> > > Fully agreed.
> > >
> > > > Under that policy things like requring we don't pass boot options that inhibit
> > > > the dump catpure kernel from initializing the hardware from a random state are
> > > > reasonable requirements. AKA I don't see any justification in this as to why we
> > > > would even want to support notsc on the dump capture kernel. Especially when
> > > > things clearly work when that option is not specified.
> > >
> > > So at least on the surface it appears 'surprising' that the 'notsc' option (which,
> > > supposedly, disables TSC handling) interferes with being able to fully boot. Even
> > > if 'notsc' is specified we are still using the local APIC, right?
> >
> > In most case, It's no problem that using local APIC while notsc is
> > specified.
> > But not for kdump.
> >
> > We can get evidence, Especially from "Spurious LAPIC timer interrupt on
> > cpu 0".
> >
> > ###serial log,
> >
> > [ 0.000000] NR_IRQS:524544 nr_irqs:256 16
> > [ 0.000000] Spurious LAPIC timer interrupt on cpu 0
> > [ 0.000000] Console: colour dummy device 80x25
> > [ 0.000000] console [tty0] enabled
> > [ 0.000000] console [ttyS0] enabled
> > [ 0.000000] tsc: Fast TSC calibration using PIT
> > [ 0.000000] tsc: Detected 2099.947 MHz processor
> > [ 0.000000] Calibrating delay loop...
> >
> >
> > Due to the local apic and local apic timer hasn't been setup and enabled
> > fully, The event_handler of clock event is NULL.
> >
> > ###codes,
> >
> > static void local_apic_timer_interrupt(void)
> > {
> > int cpu = smp_processor_id();
> > struct clock_event_device *evt = &per_cpu(lapic_events, cpu);
> >
> > /*
> > * Normally we should not be here till LAPIC has been initialized
> > but
> > * in some cases like kdump, its possible that there is a pending
> > LAPIC
> > * timer interrupt from previous kernel's context and is delivered
> > in
> > * new kernel the moment interrupts are enabled.
> > *
> > * Interrupts are enabled early and LAPIC is setup much later, hence
> > * its possible that when we get here evt->event_handler is NULL.
> > * Check for event_handler being NULL and discard the interrupt as
> > * spurious.
> > */
> > if (!evt->event_handler) {
> > pr_warning("Spurious LAPIC timer interrupt on cpu %d\n", cpu);
> > /* Switch it off */
> > lapic_timer_shutdown(evt);
> > return;
> > }
> >
> > .............
> > }
> >
> >
> > IMHO,
> > If we specify notsc, the dump-capture kernel waits for jiffies being
> > updated early and LAPIC and timer are setup much later, which causes no
> > timer interrupts is passed to BSP. as following,
> >
> > start_kernel -->
> > 1)-> calibrate_delay() -> calibrate_delay_converge() # hang and wait
> > for jiffies changed
> >
> > 2)-> rest_init() -> kernel_init() -> .... -> apic_bsp_setup() ->
> > setup_local_APIC()
> >
> > -> setup_percpu_clockev().
> >
> > the setup_percpu_clockev points setup_boot_APIC_clock() which used to
> > setup the boot APIC and timer.
> >
> >
> > > So it might be a good idea to find the root cause of this bootup fragility even if
> > > 'notsc' is specified. And I fully agree that it should be fixed in the bootup path
> > > of the dump kernel, not the crash kernel reboot path.
> >
>
> Can anyone give some advice or commet on the following ideaï
> Thanks in advance.

You can't do this.

The reason is disable_IO_APIC not only disable io-apic by masking each
pin but setup the apic virtual wire mode.

I dig code and found somethings really interesting. lapic_shutdown()
really disable local apic by writing the APIC software enable/disable
flag in the spurious-interrupt vector register, please check intel arch
manual section 10.4.3. However disable_IO_APIC() softwae enable local
apic again though just make it in virtual wire mode. About virtual wire
mode, in x86 32 IMCR is used to control PIC mode or APIC mode system can
choose. In x86 64 it seems that system will go to virtual wire mode and
then enter into APIC mode. In virtual wire mode the IRET entry of pin which
connected i8259 will be setup as pass through, and LINTIN0 will be set
as ExtMode to get the interrupt into cpu.

This can explain why in commit 522e66464467 ("x86/apic: Disable I/O APIC
before shutdown of the local APIC") Fenghua decided to call
disable_IO_APIC firstly, because disable_IO_APIC enable apic again to go
back into Virtual Wire compatibility mode. In commit log he said lapic
is disabled, but io-apic is active again.

And this also explains why you need call disable_IO_APIC twice to make
notsc work again. With commit 522e664 local apic is disabled completely,
so you have to call disable_IO_APIC() to go into virtual wire mode, this
software enable local apic.

Now I don't understand why specifing notsc lead to system hang. I am
digging clocksource code, got some finding, but not very clear.
Seems local_apic_timer_interrupt give me some clues, still not clear.
How about not hurry to fix this, let's get together to make this clear
and then make a final fix. At least disable_IO_APIC have to be changed
to make it more reasonable, from the name of function we can't see what
it really is and it really does.

If we know why notsc cause system hang, it should be clear. TSC is the
highest resolution clock source, actually on my system there are clock
sources like jiffies, refined_jiffies, hpet, tsc, pit. Why tsc?

Thanks
Baoquan

>
> > Because the lapic and timer are not ready when dump-capture waits them
> > to update the jiffies value. so I suggest to put APIC in legacy mode in
> > local_apic_timer_interrupt() temporarily, which in the bootup path of
> > dump kernel.
> >
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index dcb52850a28f..af3be93997ed 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -879,6 +879,7 @@ static void local_apic_timer_interrupt(void)
> > pr_warning("Spurious LAPIC timer interrupt on cpu %d\n",
> > cpu);
> > /* Switch it off */
> > lapic_timer_shutdown(evt);
> > + disable_IO_APIC();
> > return;
> > }
> >
> > And the new solution can fix the problem.
> > Whatâs your opinion about it?
> >
> > Thanks,
> > wei
> >
> > >
> > > Thanks,
> > >
> > > Ingo
> > >
> > >
> >
>
>
>
> _______________________________________________
> kexec mailing list
> kexec@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/kexec