RE: [PATCH RFCv1 08/14] iommufd: Add IOMMU_VIOMMU_SET_DEV_ID ioctl

From: Tian, Kevin
Date: Tue May 28 2024 - 22:58:27 EST


> From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Sent: Wednesday, May 29, 2024 4:23 AM
>
> On Mon, May 27, 2024 at 01:08:43AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > Sent: Friday, May 24, 2024 9:19 PM
> > >
> > > On Fri, May 24, 2024 at 07:13:23AM +0000, Tian, Kevin wrote:
> > > > I'm curious to learn the real reason of that design. Is it because you
> > > > want to do certain load-balance between viommu's or due to other
> > > > reasons in the kernel smmuv3 driver which e.g. cannot support a
> > > > viommu spanning multiple pSMMU?
> > >
> > > Yeah, there is no concept of support for a SMMUv3 instance where it's
> > > command Q's can only work on a subset of devices.
> > >
> > > My expectation was that VIOMMU would be 1:1 with physical iommu
> > > instances, I think AMD needs this too??
> > >
> >
> > Yes this part is clear now regarding to VCMDQ.
> >
> > But Nicoline said:
> >
> > "
> > One step back, even without VCMDQ feature, a multi-pSMMU setup
> > will have multiple viommus (with our latest design) being added
> > to a viommu list of a single vSMMU's. Yet, vSMMU in this case
> > always traps regular SMMU CMDQ, so it can do viommu selection
> > or even broadcast (if it has to).
> > "
> >
> > I don't think there is an arch limitation mandating that?
>
> What I mean is for regular vSMMU. Without VCMDQ, a regular vSMMU
> on a multi-pSMMU setup will look like (e.g. three devices behind
> different SMMUs):
> |<------ VMM ------->|<------ kernel ------>|
> |-- viommu0 --|-- pSMMU0 --|
> vSMMU--|-- viommu1 --|-- pSMMU1 --|--s2_hwpt
> |-- viommu2 --|-- pSMMU2 --|
>
> And device would attach to:
> |<---- guest ---->|<--- VMM --->|<- kernel ->|
> |-- dev0 --|-- viommu0 --|-- pSMMU0 --|
> vSMMU--|-- dev1 --|-- viommu1 --|-- pSMMU1 --|
> |-- dev2 --|-- viommu2 --|-- pSMMU2 --|
>
> When trapping a device cache invalidation: it is straightforward
> by deciphering the virtual device ID to pick the viommu that the
> device is attached to.

I understand how above works.

My question is why that option is chosen instead of going with 1:1
mapping between vSMMU and viommu i.e. letting the kernel to
figure out which pSMMU should be sent an invalidation cmd to, as
how VT-d is virtualized.

I want to know whether doing so is simply to be compatible with
what VCMDQ requires, or due to another untold reason.

>
> When doing iotlb invalidation, a command may or may not contain
> an ASID (a domain ID, and nested domain in this case):
> a) if a command doesn't have an ASID, VMM needs to broadcast the
> command to all viommus (i.e. pSMMUs)
> b) if a command has an ASID, VMM needs to initially maintain an
> S1 HWPT list by linking an ASID when adding an HWPT entry to
> the list, by deciphering vSTE and its linked CD. Then it needs
> to go through the S1 list with the ASID in the command, and to
> find all corresponding HWPTs to issue/broadcast the command.
>
> Thanks
> Nicolin