Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels onboot cpu

From: Neil Horman
Date: Tue Nov 27 2007 - 10:40:16 EST


On Tue, Nov 27, 2007 at 07:56:44AM -0700, Eric W. Biederman wrote:
> Andi Kleen <ak@xxxxxxx> writes:
>
> > his is any less reliable that what we have currently.
> >>
> >> It doesn't make things more reliable, and it adds code to a code path
> >> that already has to much code to be solid reliable (thus your
> >> problem).
> >>
> >> Putting the system back in PIC legacy mode on the kexec on panic path
> >> was supposed to be a short term hack until we could remove the need
> >> by always deliver interrupts in apic mode.
> >>
> >> If you can't root cause your problem and figure out how the apics
> >> are misconfigured for legacy mode
> >
> > Probably legacy mode always routes to CPU #0. Makes sense and is
> > not really a misconfiguration of legacy mode.
>
> Possible. So far I have not seen a hardware setup that would force
> interrupts to cpu #0 in legacy mode. But I would not be truly
> surprised if it happened that there was hardware that only worked that
> way.
>

That would certainly explain the behavior I am observing here.\

> > But if CPU #0 has interrupts disabled no interrupts get delivered.
> >
> > So choices are:
> > - Move to CPU #0
> > - Do not use legacy mode during shutdown.
> (Do not use legacy mode in the kdump kernel. removing it from shutdown
> is just minor optimization)
> > - Or do not rely on interrupts after enabling legacy mode
> > - Or do not disable interrupts on the other CPUs when they're
> > halted.
> >
> > First and last option are probably unreliable for the kdump case.
> > Second or third sound best.
> >
> > I suspect the real fix would be to enable IOAPIC mode really
> > early and never use the timers in legacy mode. Then the kdump
> > kernel wouldn't care about the legacy mode pointing to the wrong CPU.
>
> Exactly. If we can work out the details that should be a much more reliable
> mode of operation.
>
> > IIrc Eric even had a patch for that a long time ago, but it broke some
> > things so it wasn't included. But perhaps it should be revisited.
>
> My real problem was the failure case was obscure (a bad interaction
> with ACPI on Linus's laptop) and I didn't have the time to track it
> down when it showed up.
>
> My patch had two parts. Some cleanups to enable the code to be enabled
> early, and the actually early enable. I figure if we can get the
> cleanups in one major kernel version and then in the next enable
> the apic mode before we start getting interrupts we should be in good
> shape.
>
> I expect with x86 becoming an embedded platform with multiple cpus we
> may start seeing systems that don't actually support legacy PIC mode
> for interrupt delivery.
do you have a pointer to the old patch set? I'd like to try it out on the failing system here.

Regards
Neil

>
> Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/