Re: [PATCH 00/19] Fix Intel IOMMU breakage in kdump kernel

From: Joerg Roedel
Date: Tue Jun 23 2015 - 10:06:48 EST


On Tue, Jun 23, 2015 at 02:31:30PM +0100, David Woodhouse wrote:
> However, it's still fairly gratuitous for all non-broken hardware, and
> will tend to hide hardware and driver bugs during testing of new
> hardware.
>
> I'd much rather see this limited to a blacklist of known-broken
> devices, an accompanied by a kernel message along the lines of
>
> 'Preserving VT-d page tables for broken HP device xxxx:xxxx'
>
> For *any* device which isn't so broken that it craps itself on taking
> a DMA fault and cannot be reset, this page table copy shouldn't be
> needed, right?

In theory yes, but as it came to my mind recently, there is this BIOS
"value-add" called APEI (ACPI Platform Error Interface) which has a
'Firmware first' mode.

So when this is active the firmware handles any errors happening in the
system and reports them to the OS with a severity it can decide on its
own.

Such errors could be DMA target aborts, for example. And I have seen
systems where at least rejected interrupt requests were reported to the
OS as fatal errors, causing a kernel panic in Linux. But the firmware is
also free to report ordinary DMA failures as fatal errors, who knows...

So while you are right that these changes might hide hardware and driver
bugs, I think it is still the best to try avoiding such faults at all
costs in the kdump kernel to actually get a dump, even if the device
would actually be able to recover from the master abort.



Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/