RE: [PATCH RFC 10/11] iommu: Make IOPF handling framework generic

From: Tian, Kevin
Date: Tue Mar 22 2022 - 06:24:35 EST


> From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> Sent: Tuesday, March 22, 2022 6:06 PM
>
> On Tue, Mar 22, 2022 at 01:00:08AM +0000, Tian, Kevin wrote:
> > > From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> > > Sent: Monday, March 21, 2022 7:42 PM
> > >
> > > Hi Kevin,
> > >
> > > On Mon, Mar 21, 2022 at 08:09:36AM +0000, Tian, Kevin wrote:
> > > > > From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> > > > > Sent: Sunday, March 20, 2022 2:40 PM
> > > > >
> > > > > The existing IOPF handling framework only handles the I/O page faults
> for
> > > > > SVA. Ginven that we are able to link iommu domain with each I/O
> page
> > > fault,
> > > > > we can now make the I/O page fault handling framework more
> general
> > > for
> > > > > more types of page faults.
> > > >
> > > > "make ... generic" in subject line is kind of confusing. Reading this patch
> I
> > > > think you really meant changing from per-device fault handling to per-
> > > domain
> > > > fault handling. This is more accurate in concept since the fault is caused
> by
> > > > the domain page table. 😊
> > >
> > > I tend to disagree with that last part. The fault is caused by a specific
> > > device accessing shared page tables. We should keep that device
> > > information throughout the fault handling, so that we can report it to the
> > > driver when things go wrong. A process can have multiple threads bound
> to
> > > different devices, they share the same mm so if the driver wanted to
> > > signal a misbehaving thread, similarly to a SEGV on the CPU side, it would
> > > need the device information to precisely report it to userspace.
> > >
> >
> > iommu driver can include the device information in the fault data. But
> > in concept the IOPF should be reported per domain.
>
> So I don't remember where we left off on that topic, what about fault
> injection into guests? In that case device info is more than just
> diagnostic, fault injection can't work without it. I think we talked about
> passing a device cookie to userspace, just want to make sure.
>
> > and I agree with Jason that at most we can send SEGV to the entire thread
> > group since there is no way to associate a DMA back to a thread which
> > initiates the DMA.
>
> The point is providing the most accurate information to the device driver
> for diagnostics and debugging. A process opens multiple queues to
> different devices, then if one of the queues issues invalid DMA, the
> driver won't even know which queue is broken if you only report the target
> mm and not the source dev. I don't think we gain anything from discarding
> the device information from the fault path.
>

In case I didn't make it clear, what I talked about is just about having iommu
core to report IOPF per domain handler vs. per device handler while this
design choice doesn't change what the fault data should include (device,
pasid, addr, etc.). i.e. it always includes all the information provided by the
iommu driver no matter how the fault is reported upwards.

e.g. with iommufd it is iommufd to register a IOPF handler per managed
domain and receive IOPF on those domains. If necessary, iommufd further
forwards to userspace including device cookie according to the fault data.

Thanks
Kevin