Re: [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2)

From: Suthikulpanit, Suravee
Date: Mon Dec 11 2023 - 10:34:29 EST




On 12/11/2023 8:05 PM, Jason Gunthorpe wrote:
On Mon, Dec 11, 2023 at 08:36:46PM +0800, Yi Liu wrote:
On 2023/12/11 10:29, Tian, Kevin wrote:
From: Jason Gunthorpe <jgg@xxxxxxxxxx>
Sent: Saturday, December 9, 2023 9:47 AM

What is in a Nested domain:
Intel: A single IO page table refereed to by a PASID entry
Each vDomain-ID,PASID allocates a unique nesting domain
AMD: A GCR3 table pointer
Nesting domains are created for every unique GCR3 pointer.
vDomain-ID can possibly refer to multiple Nesting domains :(
ARM: A CD table pointer
Nesting domains are created for every unique CD table top pointer.

this AMD/ARM difference is not very clear to me.

How could a vDomain-ID refer to multiple GCR3 pointers? Wouldn't it
lead to cache tag conflict when a same PASID entry in multiple GCR3 tables
points to different I/O page tables?

Perhaps due to only one DomainID in the DTE table indexed by BDF? Actually,
the vDomainID will not be used to tag cache, the host DomainId would be
used instead. @Jason?

The DomainID comes from the DTE table which is indexed by the RID, and
the DTE entry points to the GCR3 table. So the VM certainly can setup
a DTE table with multiple entires having the same vDomainID but
pointing to different GCR3's. So the VMM has to do *something* with
this.

Most likely this is not a useful thing to do. However what should the
VMM do when it sees this? Block a random DTE or push the duplication
down to real HW would be my options. I'd probably try to do the latter
just on the basis of better emulation.

Jason

For AMD, the hardware uses host DomainID (hDomainId) and PASID to tag the IOMMU TLB.

The VM can setup vDomainID independently from device (RID) and hDomainID. The vDomainId->hDomainId mapping would be managed by the host IOMMU driver (since this is also needed by the HW when enabling the HW-vIOMMU support a.k.a virtual function).

Currently, the AMD IOMMU driver allocates a DomainId per IOMMU group.
One issue with this is when we have nested translation where we could end up with multiple devices (RIDs) sharing same PASID and the same hDomainID.

For example:

- Host view
Device1 (RID 1) w/ hDomainId 1
Device2 (RID 2) w/ hDomainId 1
- Guest view
Pass-through Device1 (vRID 3) w/ vDomainID A + PASID 0
Pass-through Device2 (vRID 4) w/ vDomainID B + PASID 0

We should be able to workaround this by changing the way we assign hDomainId to be per-device for VFIO pass-through devices although sharing the same v1 (stage-2) page table. This would look like.

- Host view
Device1 (RID 1) w/ hDomainId 1
Device2 (RID 2) w/ hDomainId 2
- Guest view
Pass-through Device1 (vRID 3) w/ vDomainID A + PASID 0
Pass-through Device2 (vRID 4) w/ vDomainID B + PASID 0

This should avoid the IOMMU TLB conflict. However, the invalidation would need to be done for both DomainId 1 and 2 when updating the v1 (stage-2) page table.

Thanks,
Suravee