Re: [PATCH] x86, kdump: No need to disable ioapic in crash path

From: Don Zickus
Date: Mon Apr 30 2012 - 16:54:05 EST


On Thu, Mar 15, 2012 at 05:16:50PM -0400, Seiji Aguchi wrote:
> Don,
>
> What do you think about following scenario?
> Disabling I/O APIC seems to be needed before booting kdump kernel.

For some reason I actually believed this was cleared before interrupts
were enabled on bootup. Apparently not. On a virt guest I can easily
create a scenario in which scp'ing a file then kdumping, leaves the ethernet
interrupt in a triggered state.

Before this patch, it would be masked by disable_IO_APIC. With my patch
the irq nevers gets masked and during setup_local_APIC the kernel falls
over once the local APIC is enabled (as setup_IO_APIC is called later).
Perhaps calling setup_IO_APIC before setup_local_APIC would be a better
fix?

Just like NMIs prohibit the abilty to remove the disable local apic code,
an actively triggered interrupt seems to prevent us from removing the
disable io apic.

This leaves me with my original problem of deadlocking in the
disable_IO_APIC path.

Thoughts?

Cheers,
Don

>
> Seiji
>
>
> commit 1e75b31d638d5242ca8e9771dfdcbd28a5f041df
> Author: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
> Date: Thu Aug 25 12:01:11 2011 -0700
>
> x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC
>
> In the kdump scenario mentioned below, we can have a case where
> the device using level triggered interrupt will not generate any
> interrupts in the kdump kernel.
>
> 1. IO-APIC sends a level triggered interrupt to the CPU's local APIC.
>
> 2. Kernel crashed before the CPU services this interrupt, leaving
> the remote-IRR in the IO-APIC set.
>
> 3. kdump kernel boot sequence does clear_IO_APIC() as part of IO-APIC
> initialization. But this fails to reset remote-IRR bit of the
> IO-APIC RTE as the remote-IRR bit is read-only.
>
> 4. Device using that level triggered entry can't generate any
> more interrupts because of the remote-IRR bit.
>
> In clear_IO_APIC_pin(), check if the remote-IRR bit is set and if
> so do an explicit attempt to clear it (by doing EOI write on
> modern io-apic's and changing trigger mode to edge/level on
> older io-apic's). Also before doing the explicit EOI to the
> io-apic, ensure that the trigger mode is indeed set to level.
> This will enable the explicit EOI to the io-apic to reset the
> remote-IRR bit.
>
> Tested-by: Leonardo Chiquitto <lchiquitto@xxxxxxxxxx>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
> Fixes: https://bugzilla.novell.com/show_bug.cgi?id=701686
> Cc: Rafael Wysocki <rjw@xxxxxxxxxx>
> Cc: Maciej W. Rozycki <macro@xxxxxxxxxxxxxx>
> Cc: Thomas Renninger <trenn@xxxxxxx>
> Cc: jbeulich@xxxxxxxxxx
> Cc: yinghai@xxxxxxxxxx
> Link: http://lkml.kernel.org/r/20110825190657.157502602@xxxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
>
> > -----Original Message-----
> > From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-owner@xxxxxxxxxxxxxxx] On Behalf Of Don Zickus
> > Sent: Thursday, March 15, 2012 4:27 PM
> > To: x86@xxxxxxxxxx
> > Cc: LKML; kexec-list; Eric W. Biederman; Vivek Goyal
> > Subject: Re: [PATCH] x86, kdump: No need to disable ioapic in crash path
> >
> > On Wed, Feb 29, 2012 at 03:08:49PM -0500, Don Zickus wrote:
> > > A customer of ours noticed when their machine crashed, kdump did not
> > > work but hung instead. Using their firmware dumping solution they
> > > grabbed a vmcore and decoded the stacks on the cpus. What they
> > > noticed seemed to be a rare deadlock with the ioapic_lock.
> >
> > While we are discussing the NMI stuff in another thread, does anyone have any objection to committing this patch. It fixes a real
> > problem today.
> >
> > Cheers,
> > Don
> >
> > >
> > > CPU4:
> > > machine_crash_shutdown
> > > -> machine_ops.crash_shutdown
> > > -> native_machine_crash_shutdown
> > > -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
> > > -> disable_IO_APIC
> > > -> clear_IO_APIC
> > > -> clear_IO_APIC_pin
> > > -> ioapic_read_entry
> > > -> spin_lock_irqsave(&ioapic_lock, flags)
> > > ---Infinite loop here---
> > >
> > > CPU0:
> > > do_IRQ
> > > -> handle_irq
> > > -> handle_edge_irq
> > > -> ack_apic_edge
> > > -> move_native_irq
> > > -> mask_IO_APIC_irq
> > > -> mask_IO_APIC_irq_desc
> > > -> spin_lock_irqsave(&ioapic_lock, flags)
> > > ---Receive NMI here after getting spinlock---
> > > -> nmi
> > > -> do_nmi
> > > -> crash_nmi_callback
> > > ---Infinite loop here---
> > >
> > > The problem is that although kdump tries to shutdown minimal hardware,
> > > it still needs to disable the IO APIC. This requires spinlocks which
> > > may be held by another cpu. This other cpu is being held infinitely
> > > in an NMI context by kdump in order to serialize the crashing path.
> > > Instant deadlock.
> > >
> > > Eric, brought up a point that because the boot code was restructured
> > > we may not need to disable the io apic any more in the crash path.
> > > The original concern that led to the development of disable_IO_APIC,
> > > was that the jiffies calibration on boot up relied on the PIT timer
> > > for reference. Access to the PIT required 8259 interrupts to be
> > > working. This wouldn't work if the ioapic needed to be configured.
> > > So on panic path, the ioapic was reconfigured to use virtual wire mode to allow the 8259 to passthrough.
> > >
> > > Those concerns don't hold true now, thanks to the jiffies calibration
> > > code not needing the PIT. As a result, we can remove this call and
> > > simplify the locking needed in the panic path.
> > >
> > > I tested kdump on an Ivy Bridge platform, a Pentium4 and an old athlon
> > > that did not have an ioapic. All three were successful.
> > >
> > > I also tested using lkdtm that would use jprobes to panic the system
> > > when entering do_IRQ. The idea was to see how the system reacted with
> > > an interrupt pending in the second kernel. My core2 quad successfully
> > > kdump'd
> > > 3 times in a row with no issues.
> > >
> > > v2: removed the disable lapic code too
> > > v3: re-add disabling of lapic code
> > >
> > > Cc: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
> > > Cc: Vivek Goyal <vgoyal@xxxxxxxxxx>
> > > Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>
> > > ---
> > >
> > > There are really two problems here. One is the deadlock of the
> > > ioapic_lock that I describe above. Removing the code to disable the
> > > ioapic seems to resolve that.
> > >
> > > The second issue is handling non-IRQ exceptions like NMIs. Eric asked
> > > me to include removing the disable lapic code too. However, because
> > > the nmi watchdog is stil active and kexec zeros out the idt before it
> > > jumps to purgatory, an NMI that comes in during the transition between
> > > the first kernel and second kernel will see an empty idt and reset the cpu.
> > >
> > > Leaving the code to disable the lapic in, turns off perf and blocks
> > > those NMIs from happening (though an external NMI would still be an
> > > issue but that is no different than right now).
> > >
> > > I tried playing with a stub idt and leaving it in place through the
> > > transition to the second kernel, but I can't quite get it to work
> > > correctly. Spinning in the first kernel before the purgatory jump
> > > catches the idt properly. Spinning in purgatory before the second
> > > kernel jump doesn't. I even disabled the zero'ing out of the idt in the purgatory code.
> > >
> > > I would like to get resolution on the ioapic deadlock to fix a
> > > customer issue while working the idt and NMI thing on the side, hence
> > > the split of this patchset.
> > >
> > > Hopefully, people recognize there are two issues here and that this
> > > patch resolves the first one and the second one needs more debugging and time.
> > > ---
> > > arch/x86/kernel/crash.c | 3 ---
> > > 1 files changed, 0 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index
> > > 13ad899..b053cf9 100644
> > > --- a/arch/x86/kernel/crash.c
> > > +++ b/arch/x86/kernel/crash.c
> > > @@ -96,9 +96,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > > cpu_emergency_svm_disable();
> > >
> > > lapic_shutdown();
> > > -#if defined(CONFIG_X86_IO_APIC)
> > > - disable_IO_APIC();
> > > -#endif
> > > #ifdef CONFIG_HPET_TIMER
> > > hpet_disable();
> > > #endif
> > > --
> > > 1.7.7.6
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> > majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/