Re: [PATCH v2 08/16] iommu: introduce device fault data

From: Jacob Pan
Date: Mon Nov 13 2017 - 11:11:13 EST


On Mon, 13 Nov 2017 13:19:50 +0000
Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> wrote:

> On 11/11/17 00:00, Jacob Pan wrote:
> > On Fri, 10 Nov 2017 13:54:59 +0000
> > Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> wrote:
> >
> >> /*
> >> * Note: I tried to synthesize what I believe would be useful to
> >> device
> >> * drivers and guests, with regards to the kind of faults that the
> >> ARM
> >> * SMMU is capable of reporting. Other IOMMUs may report more or
> >> less
> >> * fault reasons, and I guess one should try to associate the
> >> faults
> >> * reason that matches best a generic one when reporting a fault.
> >> *
> >> * Finer reason granularity is probably not useful to anyone, and
> >> * coarser granularity would require more work from intermediate
> >> * components processing the fault to figure out what happened,
> >> whose
> >> * fault it actually is and where to route it (process vs. device
> >> driver
> >> * vs. vIOMMU driver misprogamming tables).
> >> */
> >> enum iommu_fault_reason {
> >> IOMMU_FAULT_REASON_UNKNOWN = 0,
> >>
> > can we add one for iommu internal error, the specifics may not be
> > useful for the device drivers, but it is good to know iommu is
> > faulting, perhaps can take action that inform driver users.
> > i.e.
> > /* IOMMU internal error, no specific reason to report out */
> > IOMMU_FAULT_REASON_INTERNAL,
>
> Yes, and maybe it should replace IOMMU_FAULT_REASON_UNKNOWN, since the
> device driver would probably handle it the same way. I guess the
> INTERNAL fault is always fatal, meaning that the IOMMU is shutting
> down and there is no hope to recover? I can't see a good reason to
> inform users of non-fatal internal faults, the IOMMU driver will
> print those to dmesg and keep going. For fatal faults we're telling
> users to stop issuing transactions so the IOMMU driver doesn't get
> flooded by events, for example.
>
Internal faults are not always fatal in vt-d, e.g. programming reserved
bits, though it should fail in the first place when programming it
synchronously as the code does today.
I agree we put all internal faults under IOMMU_FAULT_REASON_INTERNAL,
no more UNKNOWN, it is ambiguous in that unknown to device but known to
iommu.