Re: [PATCH] x86_64: restore mask_bits in msi shutdown

From: Yinghai Lu
Date: Thu Apr 17 2008 - 06:06:43 EST


On Thu, Apr 17, 2008 at 2:19 AM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
>
> Yinghai Lu <yhlu.kernel.send@xxxxxxxxx> writes:
>
> > I can not kexec RHEL 5.1 from 2.6.25-rc3 later
> >
> > caused by:
> > commit 89d694b9dbe769ca1004e01db0ca43964806a611
> > Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Date: Mon Feb 18 18:25:17 2008 +0100
> >
> > genirq: do not leave interupts enabled on free_irq
> >
> > The default_disable() function was changed in commit:
> >
> > 76d2160147f43f982dfe881404cfde9fd0a9da21
> > genirq: do not mask interrupts by default
> >
> > It removed the mask function in favour of the default delayed
> > interrupt disabling. Unfortunately this also broke the shutdown in
> > free_irq() when the last handler is removed from the interrupt for
> > those architectures which rely on the default implementations. Now we
> > can end up with a enabled interrupt line after the last handler was
> > removed, which can result in spurious interrupts.
> >
> > Fix this by adding a default_shutdown function, which is only
> > installed, when the irqchip implementation does provide neither a
> > shutdown nor a disable function.
> >
> > [@stable: affected versions: .21 - .24 ]
> >
> >
> >
> > for MSI, default_shutdown will call mask_bit for msi device. so all mask bits
> > will
> > left disabled after free_irq.
> > then if kexec next kernel that only can use msi_enable bit.
> > all device's MSI can not be used.
> >
> > So try to restore MSI mask bits that is saved before using msi in first kernel.
> >
> > Signed-off-by: Yinghai Lu <yhlu.kernel@xxxxxxxxx>
>
> Ouch! In the case of MSI-X this is horrible. Reenabling an interrupt
> line when we are not using it. That is likely to cause even stranger
> things than kexec to fail.
>
> What happens when someone next comes to use that msi interrupt is
> a reasonable question.
>
> The PCI standard describes the state the bits in the msi capability
> are supposed to be in after reset, and if a driver is going to
> assume some state that is the only reasonable state for a driver to
> expect the hardware to be in. So we don't need to perform a
> save/restore cycle.
>
> Could you look at having pci_disable_msi reset the mask bit
> to it's default state after we have called msi_set_enable(dev, 0)?
> Once the msi capability is disabled the mask bit has no affect.

but the next kernel (RHEL 5.1) only can enable msi, and it doesn't
touch mask_bits (that is leaved as 0xff by first 2.6.25-rc2 later)
so device (nvidia mcp55 nic) doesn't work. --- if i manually used
setpci to set 0x60 to 0xfe or 0x00. it will work. also if i booted
kernel (RHEL 5.1), that 0x60
is always 0x00, even nic works with MSI.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/