Re: [RFC v2] /dev/iommu uAPI proposal

From: Shenming Lu
Date: Thu Jul 15 2021 - 02:29:17 EST

On 2021/7/15 11:55, Tian, Kevin wrote:
>> From: Shenming Lu <lushenming@xxxxxxxxxx>
>> Sent: Thursday, July 15, 2021 11:21 AM
>> On 2021/7/9 15:48, Tian, Kevin wrote:
>>> 4.6. I/O page fault
>>> +++++++++++++++++++
>>> uAPI is TBD. Here is just about the high-level flow from host IOMMU driver
>>> to guest IOMMU driver and backwards. This flow assumes that I/O page
>> faults
>>> are reported via IOMMU interrupts. Some devices report faults via device
>>> specific way instead of going through the IOMMU. That usage is not
>> covered
>>> here:
>>> - Host IOMMU driver receives a I/O page fault with raw fault_data {rid,
>>> pasid, addr};
>>> - Host IOMMU driver identifies the faulting I/O page table according to
>>> {rid, pasid} and calls the corresponding fault handler with an opaque
>>> object (registered by the handler) and raw fault_data (rid, pasid, addr);
>>> - IOASID fault handler identifies the corresponding ioasid and device
>>> cookie according to the opaque object, generates an user fault_data
>>> (ioasid, cookie, addr) in the fault region, and triggers eventfd to
>>> userspace;
>> Hi, I have some doubts here:
>> For mdev, it seems that the rid in the raw fault_data is the parent device's,
>> then in the vSVA scenario, how can we get to know the mdev(cookie) from
>> the
>> rid and pasid?
>> And from this point of view,would it be better to register the mdev
>> (iommu_register_device()) with the parent device info?
> This is what is proposed in this RFC. A successful binding generates a new
> iommu_dev object for each vfio device. For mdev this object includes
> its parent device, the defPASID marking this mdev, and the cookie
> representing it in userspace. Later it is iommu_dev being recorded in
> the attaching_data when the mdev is attached to an IOASID:
> struct iommu_attach_data *__iommu_device_attach(
> struct iommu_dev *dev, u32 ioasid, u32 pasid, int flags);
> Then when a fault is reported, the fault handler just needs to figure out
> iommu_dev according to {rid, pasid} in the raw fault data.

Yeah, we have the defPASID that marks the mdev and refers to the default
I/O address space, but how about the non-default I/O address spaces?
Is there a case that two different mdevs (on the same parent device)
are used by the same process in the guest, thus have a same pasid route
in the physical IOMMU? It seems that we can't figure out the mdev from
the rid and pasid in this case...

Did I misunderstand something?... :-)