Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA

From: Takao Indoh
Date: Sun Jul 28 2013 - 20:20:59 EST


(2013/07/25 23:24), Vivek Goyal wrote:
> On Wed, Jul 24, 2013 at 03:29:58PM +0900, Takao Indoh wrote:
>> Sorry for letting this discussion slide, I was busy on other works:-(
>> Anyway, the summary of previous discussion is:
>> - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on
>> boot. This expects PCI enumeration is done before IOMMU
>> initialization as follows.
>> (1) PCI enumeration
>> (2) fs_initcall ---> device reset
>> (3) IOMMU initialization
>> - This works on x86, but does not work on other architecture because
>> IOMMU is initialized before PCI enumeration on some architectures. So,
>> device reset should be done where IOMMU is initialized instead of
>> initcall.
>> - Or, as another idea, we can reset devices in first kernel(panic kernel)
>>
>> Resetting devices in panic kernel is against kdump policy and seems not to
>> be good idea. So I think adding reset code into iommu initialization is
>> better. I'll post patches for that.
>
> I don't understand all the details but I agree that idea of trying to
> reset IOMMU in crashed kernel might not fly.
>
>>
>> Another discussion point is how to handle buggy devices. Resetting buggy
>> devices makes system more unstable. One of ideas is using boot parameter
>> so that user can choose to reset devices or not.
>
> So who would decide which device is buggy and don't reset it. Give
> some details here.

I found the case that kdump does not work after resetting devices and
it works when removing reset patch. The cause of problem is a bug of
PCIe switch chip. If there is boot parameter not to reset devices,
user can use it as workaround.

I think in this case we should add PCI quirk to avoid this buggy
hardware, but we need to wait errata from vendor and it basically takes
long time.

>
> Can't we simply blacklist associated module, so that it never loads
> and then it never tries to reset the devices?
>

So you mean that device reset should be done on its driver loading?

Thanks,
Takao Indoh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/