RE: [PATCH] iommufd: Enforce IOMMU_RESV_SW_MSI upon hwpt_paging allocation

From: Tian, Kevin
Date: Wed Jul 31 2024 - 03:46:40 EST


> From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Sent: Monday, July 29, 2024 7:51 AM
>
> IOMMU_RESV_SW_MSI is a unique region defined by an IOMMU driver.
> Though it
> is eventually used by a device for address translation to an MSI location
> (including nested cases), practically it is a universal region across all
> domains allocated for the IOMMU that defines it.
>
> Currently IOMMUFD core fetches and reserves the region during an attach to
> an hwpt_paging. It works with a hwpt_paging-only case, but might not work
> with a nested case where a device could directly attach to a hwpt_nested,
> bypassing the hwpt_paging attachment.

This probably needs a bit more context. IIUC it's the ARM-side choice
that instead of letting VMM emulate a vITS for S1 and then map it to
physical ITS range in S2 it relies on the kernel to continue the msi
cookie reservation in S2 and then expects the guest to identity map
it in S1.

With that context if a device is directly attached to a hwpt_nested,
hwpt_paging attachment is bypassed including the msi doorbell
setup on the parent S2 then it's broken.

> @@ -364,7 +305,8 @@ int iommufd_hw_pagetable_attach(struct
> iommufd_hw_pagetable *hwpt,
> }
>
> if (hwpt_is_paging(hwpt)) {
> - rc = iommufd_hwpt_paging_attach(to_hwpt_paging(hwpt),
> idev);
> + rc = iopt_table_enforce_dev_resv_regions(
> + &to_hwpt_paging(hwpt)->ioas->iopt, idev-
> >dev);

Is it simpler to extend the original operation to the parent S2 when
it's hwpt_nested?

The name iommufd_hwpt_paging_attach() is a bit misleading. The
actual work there is all about reservations. It doesn't change any
tracking structure about attachment between device and hwpt.

The only downside is unnecessarily reserved regions of dev1
(attached to hwpt_nested) added to S2 which might be directly
attached only by dev2 so the available ranges for dev2 are
unnecessarily shrunk.

but I'm not sure that would be a real problem in practice, given
1) there is no usage using up closely the entire IOVA space yet,
2) guest may change the viommu mode to switch between nested
and paging then VMM has to take all devices' reserved regions
into consideration anyway, when composing the GPA space.

With that I think continuing this per-device reservation scheme is
easier than adding specific reservation for SW_MSI at hwpt creation
time and then further requiring check at attach time to verify
the attached device is allocated with the same address as the one
during allocation.