Re: [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation

From: Suthikulpanit, Suravee

Date: Thu Jan 15 2026 - 04:22:01 EST




On 11/14/2025 3:36 AM, Nicolin Chen wrote:
@@ -92,7 +94,60 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
.....
+ /*
+ * Normally, when a guest has multiple pass-through devices,
+ * the IOMMU driver setup DTEs with the same stage-2 table and
+ * use the same host domain ID (hDomId). In case of nested translation,
+ * if the guest setup different stage-1 tables with same PASID,
+ * IOMMU would use the same TLB tag. This will results in TLB
+ * aliasing issue.
+ *
+ * The guest is assigning gDomIDs based on its own algorithm for managing
+ * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
+ * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
+ * to a single hDomID. This is done using an xarray in the vIOMMU to
+ * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
+ * command must be issued for each hDomID in the xarray.
+ */
+ curr = xa_cmpxchg(&aviommu->gdomid_array,
+ ndom->gdom_id, NULL, gdom_info, GFP_ATOMIC);
+ if (curr) {
+ if (xa_err(curr)) {
+ ret = -EINVAL;
+ goto out_err_gdom_info;
+ } else {
+ /* The gDomID already exist */
+ pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
+ __func__, ndom->gdom_id, curr->hdom_id);
+ refcount_inc(&curr->users);
+ ndom->gdom_info = curr;
This looks racy..

When a gDomID is shared between two nested domains, a concurrent
nested_domain_free() could enter before refcount_inc(), and call
refcount_dec_and_test() or even free the curr and ndom.

Then, this refcount_inc() will blow up, or curr/ndom will UAF.

Actually, I don't see where amd_iommu_alloc_domain_nested() gets
used in this series.. I assume AMD will use the iommufd's vIOMMU
infrastructure directly which doesn't mutex across nested domain
allocation/free calls.

So, the entire thing here should hold xa_lock(), use xas_load()
for the existing curr and use xas_store() to store gdom_info if
!curr, and xa_unlock() after gdom_info is fully initialized.

+ kfree(gdom_info);
+ return &ndom->domain;
+ }
+ }
+
+ /* The gDomID does not exist. We allocate new hdom_id */
+ gdom_info->hdom_id = amd_iommu_pdom_id_alloc();
+ if (gdom_info->hdom_id <= 0) {
+ xa_cmpxchg(&aviommu->gdomid_array,
+ ndom->gdom_id, gdom_info, NULL, GFP_ATOMIC);
+ ret = -ENOSPC;
+ goto out_err_gdom_info;
+ }
+
+ refcount_set(&gdom_info->users, 1);
Similar risk here. gdom_info is stored to the xarray before this
line. A concurrent amd_iommu_alloc_domain_nested() could get the
stored gdom_info and blow up at refcount_inc().

Make sure the entire thing is locked and safe.

Nicolin

Thanks for pointing this out. I am fixing this in V6 w/ Jason's suggestion to use xa_lock/unlock w/ __xa_cmpxchg.

Thanks,
Suravee