Re: [PATCH v2 2/7] PCI: don't touch enable_cnt in pci_device_shutdown()

From: Bjorn Helgaas
Date: Tue Feb 05 2013 - 14:23:31 EST


On Tue, Feb 5, 2013 at 8:28 AM, Khalid Aziz <khalid@xxxxxxxxxxxxxx> wrote:
> On Mon, 2013-02-04 at 16:13 -0700, Bjorn Helgaas wrote:
>> On Mon, Feb 4, 2013 at 3:20 PM, Khalid Aziz <khalid@xxxxxxxxxxxxxx> wrote:
>> > On Mon, 2013-02-04 at 15:55 +0400, Konstantin Khlebnikov wrote:
>> >> Matthew Garrett and Alan Cox said (see LKML link below) that clearing bus-master
>> >> for all PCI devices may lead to unpredictable consequences, some devices ignores
>> >> this bit and continues DMA, some of them hang after that or crash whole system.
>> >> Probably we should leave here only warning and disable bus-mastering for each
>> >> driver individually in ->shutdown() callback.
>> >
>> > Agreed that the right place for shutting down a PCI device properly and
>> > clearing its Bus Master bit, is the driver shutdown routine, if only all
>> > drivers supplied a shutdown routine. As it is today, there are too many
>> > drivers that do not provide a shutdown routine, ata_piix, Marvell SATA
>> > driver, ATI AGP driver just to name a few among a large number of them.
>> > Yet kexec is expected to work inspite of these drivers especially since
>> > kdump depends on it. So until all PCI drivers supply a shutdown routine,
>> > this is just a band-aid to disable interrupt and Bus Master bit in
>> > pci_device_shutdown(). Most drivers do seem to supply a suspend and
>> > resume function and it was discussed many years ago if it is feasible to
>> > use the suspend() routine for drivers to shut devices down cleanly.
>> > Maybe it is time to revisit that discussion.
>>
>> This patch as posted doesn't do anything with IRQs. It only clears
>> PCI_COMMAND_MASTER.
>>
>> I'm open to considering something with IRQs, but I don't understand
>> exactly what we should do. In your response to the previous version
>> (https://lkml.org/lkml/2013/1/28/720) you suggested this:
>>
>> pci_clear_master(pci_dev);
>> pcibios_disable_device(pci_dev);
>>
>> Did you figure out specifically why pcibios_disable_device() helps?
>> Using pcibios_disable_device() doesn't seem like the ideal solution
>> because on most architectures, it is an empty function with no obvious
>> connection to IRQs. On x86 with ACPI, it cleans up some ACPI PCI IRQ
>> stuff, but as far as I can tell, it doesn't actually touch the PCI
>> device itself or even the IOAPIC to which it's connected, so I'm not
>> sure how this would help kexec.
>
> My reading of the code was that pcibios_disable_device() does clear the
> interrupt on x86 and ia64. I am not deeply familiar with the ACPI code
> and I might be interpreting it incorrectly, so please do correct me if I
> am reading it incorrectly. Here is the code sequence I see:
>
> pcibios_disable_device() ->
> pcibios_disable_irq() ->
> acpi_pci_irq_disable() ->
> acpi_pci_link_free_irq() ->
> acpi_evaluate_object(link->device->handle, "_DIS", NULL,
> NULL);
>
> My understanding is the evaluation of ACPI _DIS method will disable the
> interrupt from the device. Does that sound reasonable?

I see the code you're looking at in acpi_pci_link_free_irq(), but we
only evaluate _DIS if link->refcnt == 0, and I don't think refcnt is
ever zero at that point.

refcnt starts out at zero in acpi_pci_link_add() (called when we find
PNP0C0F devices), and it's incremented in acpi_pci_link_allocate_irq()
(called in the pci_enable_device() path), but as far as I can tell,
it's never decremented, so I doubt that _DIS is ever evaluated.

If we did evaluate _DIS, it would act on an "interrupt link" device,
not on the PCI device itself. I guess that could help, but only for
devices connected to such a link device. For others, I guess we might
be able to accomplish something similar by updating local APIC and/or
IOAPIC config. I don't think we do that today, at least not in the
pci_disable_device() path, but it might be something interesting to
explore. There is also the INTx Disable bit, though it's obviously
only on new PCI devices.

> ... The two ways I can think of are to stop
> DMA by clearing Bus Master bit and turn off the interrupt, which have
> been shown to get kexec (and thus kdump) working on machines it didn't
> work on before.

I was just curious if you had actually verified that _DIS was being
evaluated and making a difference here, or if the Bus Master bit was
really the important part.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/