Re: [PATCH 00/19] Fix Intel IOMMU breakage in kdump kernel

From: David Woodhouse
Date: Tue Jun 23 2015 - 10:39:08 EST


On Tue, 2015-06-23 at 16:06 +0200, Joerg Roedel wrote:
> On Tue, Jun 23, 2015 at 02:31:30PM +0100, David Woodhouse wrote:
> > However, it's still fairly gratuitous for all non-broken hardware, and
> > will tend to hide hardware and driver bugs during testing of new
> > hardware.
> >
> > I'd much rather see this limited to a blacklist of known-broken
> > devices, an accompanied by a kernel message along the lines of
> >
> > 'Preserving VT-d page tables for broken HP device xxxx:xxxx'
> >
> > For *any* device which isn't so broken that it craps itself on taking
> > a DMA fault and cannot be reset, this page table copy shouldn't be
> > needed, right?
>
> In theory yes, but as it came to my mind recently, there is this BIOS
> "value-add" called APEI (ACPI Platform Error Interface) which has a
> 'Firmware first' mode.
>
> So when this is active the firmware handles any errors happening in the
> system and reports them to the OS with a severity it can decide on its
> own.
>
> Such errors could be DMA target aborts, for example. And I have seen
> systems where at least rejected interrupt requests were reported to the
> OS as fatal errors, causing a kernel panic in Linux. But the firmware is
> also free to report ordinary DMA failures as fatal errors, who knows...

Yay for BIOS value subtract.

The thing is, this would be utterly broken. The IOMMU is supposed to
protect us from rogue devices. In this hypothetical scenario, a device
can bring the entire system down and we have no chance to isolate it
and recover. It means that assigning devices to guests should be
*disallowed* because it can't be done securely.

On this kind of system, we might as well turn off the IOMMU entirely as
in a lot of respects, it's only making things *worse*.

> So while you are right that these changes might hide hardware and driver
> bugs, I think it is still the best to try avoiding such faults at all
> costs in the kdump kernel to actually get a dump, even if the device
> would actually be able to recover from the master abort.

How about an *option* to do it for all devices (which in turn can
perhaps be triggered by a system-level blacklist for things like APEI,
or perhaps just a system DMI match on "HP").

--
dwmw2

Attachment: smime.p7s
Description: S/MIME cryptographic signature