Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper

From: Robin Murphy
Date: Thu Mar 09 2023 - 07:51:33 EST


On 2023-03-09 10:53, Nicolin Chen wrote:
The nature of ITS virtualization on ARM is done via hypercalls, so kernel
handles all IOVA mappings for the MSI doorbell in iommu_dma_prepare_msi()
and iommu_dma_compose_msi_msg(). The current virtualization solution with
a 2-stage nested translation setup is to do 1:1 IOVA mappings at stage-1
guest-level IO page table via a RMR region in guest-level IORT, aligning
with an IOVA region that's predefined and mapped in the host kernel:

[stage-2 host level]
#define MSI_IOVA_BASE 0x8000000
#define MSI_IOVA_LENGTH 0x100000
...
iommu_get_msi_cookie():
cookie->msi_iova = MSI_IOVA_BASE;
...
iommu_dma_prepare_msi(its_pa):
domain = iommu_get_domain_for_dev(dev);
iommu_dma_get_msi_page(its_pa, domain):
cookie = domain->iova_cookie;
iova = iommu_dma_alloc_iova():
return cookie->msi_iova - size;
iommu_map(iova, its_pa, ...);

[stage-1 guest level]
// Define in IORT a RMR [MSI_IOVA_BASE, MSI_IOVA_LENGTH]
...
iommu_create_device_direct_mappings():
iommu_map(iova=MSI_IOVA_BASE, pa=MSI_IOVA_BASE, len=MSI_IOVA_LENGTH);

This solution calling iommu_get_domain_for_dev() needs the device to get
attached to a host-level iommu_domain that has the msi_cookie.

On the other hand, IOMMUFD designs two iommu_domain objects to represent
the two stages: a stage-1 domain (IOMMU_DOMAIN_NESTED type) and a stage-2
domain (IOMMU_DOMAIN_UNMANAGED type). In this design, the device will be
attached to the stage-1 domain representing a guest-level IO page table,
or a Context Descriptor Table in SMMU's term.

This is obviously a mismatch, as the iommu_get_domain_for_dev() does not
return the correct domain pointer in iommu_dma_prepare_msi().

Add an iommu_get_unmanaged_domain helper to allow drivers to return the
correct IOMMU_DOMAIN_UNMANAGED iommu_domain having the IOVA mappings for
the msi_cookie. Keep it in the iommu-priv header for internal use only.

Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
Signed-off-by: Nicolin Chen <nicolinc@xxxxxxxxxx>
---
drivers/iommu/dma-iommu.c | 5 +++--
drivers/iommu/iommu-priv.h | 15 +++++++++++++++
include/linux/iommu.h | 2 ++
3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 99b2646cb5c7..6b0409d0ff85 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -31,6 +31,7 @@
#include <linux/vmalloc.h>
#include "dma-iommu.h"
+#include "iommu-priv.h"
struct iommu_dma_msi_page {
struct list_head list;
@@ -1652,7 +1653,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
{
struct device *dev = msi_desc_to_dev(desc);
- struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+ struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);

This still doesn't make sense - most of the time this will be expected to return the default DMA/identity domain if that's what the device is currently using. We can't know whether the current domain is managed or not until we look at it.

Just like every other caller of iommu_get_domain_for_dev(), what we want here is the current kernel-owned domain that we can inspect and maybe do standard IOMMU API things with. Why can't iommu_get_domain_for_dev() simply maintain that established usage model and return the kernel-owned s2_domain from a nested domain automatically? No IOMMU API user expects or needs it to return anything else (and IOMMUFD should certainly not be losing track of a nested domain within its own higher-level abstractions and needing to fall back on iommu_get_domain_for_dev()), so I really don't see a valid reason to overcomplicate things.

Please note I stress "valid" since I'm not buying arbitrarily made-up conceptual purity arguments. A nested domain cannot be the "one true domain" that is an opaque combination of S1+S2; the IOMMU API view has to be more like the device is attached to both the nested domain and the parent stage 2 domain somewhat in parallel. Even when nesting is active, the S2 domain still exists as a domain in its own right, and still needs to be visible and operated on as such, for instance if memory is hotplugged in or out of the VM.

TBH I'd also move the s2_domain pointer into the iommu_domain itself, since it's going to be a common feature for all nesting implementations, thus there seems little need to indirect lookups through the drivers at all.

Thanks,
Robin.

struct iommu_dma_msi_page *msi_page;
static DEFINE_MUTEX(msi_prepare_lock); /* see below */
@@ -1685,7 +1686,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
{
struct device *dev = msi_desc_to_dev(desc);
- const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+ const struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);
const struct iommu_dma_msi_page *msi_page;
msi_page = msi_desc_get_iommu_cookie(desc);
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index a6e694f59f64..da8044da9ad8 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -15,6 +15,21 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
return dev->iommu->iommu_dev->ops;
}
+static inline struct iommu_domain *iommu_get_unmanaged_domain(struct device *dev)
+{
+ const struct iommu_ops *ops;
+
+ if (!dev->iommu || !dev->iommu->iommu_dev)
+ goto attached_domain;
+
+ ops = dev_iommu_ops(dev);
+ if (ops->get_unmanaged_domain)
+ return ops->get_unmanaged_domain(dev);
+
+attached_domain:
+ return iommu_get_domain_for_dev(dev);
+}
+
int iommu_group_replace_domain(struct iommu_group *group,
struct iommu_domain *new_domain);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 080278c8154d..76c65cc4fc15 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -275,6 +275,8 @@ struct iommu_ops {
struct iommu_domain *parent,
const void *user_data);
+ struct iommu_domain *(*get_unmanaged_domain)(struct device *dev);
+
struct iommu_device *(*probe_device)(struct device *dev);
void (*release_device)(struct device *dev);
void (*probe_finalize)(struct device *dev);