Re: [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation
From: Nicolin Chen
Date: Thu Nov 13 2025 - 15:36:40 EST
On Wed, Nov 12, 2025 at 06:25:03PM +0000, Suravee Suthikulpanit wrote:
> @@ -38,10 +40,42 @@ size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type vi
> int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
> const struct iommu_user_data *user_data)
> {
> + unsigned long flags;
> struct protection_domain *pdom = to_pdomain(parent);
> struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
>
> + xa_init(&aviommu->gdomid_array);
Perhaps init with XA_FLAGS_ALLOC1 since domid can't be 0?
> +static void amd_iommufd_viommu_destroy(struct iommufd_viommu *viommu)
> +{
> + unsigned long flags;
> + struct amd_iommu_viommu *entry, *next;
> + struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
> + struct protection_domain *pdom = aviommu->parent;
> +
> + spin_lock_irqsave(&pdom->lock, flags);
> + list_for_each_entry_safe(entry, next, &pdom->viommu_list, pdom_list) {
> + if (entry == aviommu)
> + list_del(&entry->pdom_list);
> + }
Do we really need the loop? Why not simply do list_del()?
> + spin_unlock_irqrestore(&pdom->lock, flags);
> +
> +}
No need of the extra line at the end of the function.
> @@ -92,7 +94,60 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> ndom->domain.type = IOMMU_DOMAIN_NESTED;
> ndom->viommu = aviommu;
>
> + gdom_info = kzalloc(sizeof(*gdom_info), GFP_KERNEL);
> + if (!gdom_info)
> + goto out_err;
Missing:
ret = -ENOMEM;
> +
> + /*
> + * Normally, when a guest has multiple pass-through devices,
> + * the IOMMU driver setup DTEs with the same stage-2 table and
> + * use the same host domain ID (hDomId). In case of nested translation,
> + * if the guest setup different stage-1 tables with same PASID,
> + * IOMMU would use the same TLB tag. This will results in TLB
> + * aliasing issue.
> + *
> + * The guest is assigning gDomIDs based on its own algorithm for managing
> + * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
> + * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
> + * to a single hDomID. This is done using an xarray in the vIOMMU to
> + * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
> + * command must be issued for each hDomID in the xarray.
> + */
> + curr = xa_cmpxchg(&aviommu->gdomid_array,
> + ndom->gdom_id, NULL, gdom_info, GFP_ATOMIC);
> + if (curr) {
> + if (xa_err(curr)) {
> + ret = -EINVAL;
> + goto out_err_gdom_info;
> + } else {
> + /* The gDomID already exist */
> + pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
> + __func__, ndom->gdom_id, curr->hdom_id);
> + refcount_inc(&curr->users);
> + ndom->gdom_info = curr;
This looks racy..
When a gDomID is shared between two nested domains, a concurrent
nested_domain_free() could enter before refcount_inc(), and call
refcount_dec_and_test() or even free the curr and ndom.
Then, this refcount_inc() will blow up, or curr/ndom will UAF.
Actually, I don't see where amd_iommu_alloc_domain_nested() gets
used in this series.. I assume AMD will use the iommufd's vIOMMU
infrastructure directly which doesn't mutex across nested domain
allocation/free calls.
So, the entire thing here should hold xa_lock(), use xas_load()
for the existing curr and use xas_store() to store gdom_info if
!curr, and xa_unlock() after gdom_info is fully initialized.
> + kfree(gdom_info);
> + return &ndom->domain;
> + }
> + }
> +
> + /* The gDomID does not exist. We allocate new hdom_id */
> + gdom_info->hdom_id = amd_iommu_pdom_id_alloc();
> + if (gdom_info->hdom_id <= 0) {
> + xa_cmpxchg(&aviommu->gdomid_array,
> + ndom->gdom_id, gdom_info, NULL, GFP_ATOMIC);
> + ret = -ENOSPC;
> + goto out_err_gdom_info;
> + }
> +
> + refcount_set(&gdom_info->users, 1);
Similar risk here. gdom_info is stored to the xarray before this
line. A concurrent amd_iommu_alloc_domain_nested() could get the
stored gdom_info and blow up at refcount_inc().
Make sure the entire thing is locked and safe.
Nicolin