Re: [PATCH rc v6] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
From: Nicolin Chen
Date: Sat Apr 18 2026 - 00:57:13 EST
On Fri, Apr 17, 2026 at 02:44:41PM -0700, Nicolin Chen wrote:
> On Fri, Apr 17, 2026 at 08:24:27AM +0000, Tian, Kevin wrote:
> > one is that iommu_detach_device_pasid() is not blocked which can trigger
> > devtlb invalidation in middle of reset. but it cannot fail. so the right fix is
> > to skip the blocked device in __iommu_remove_group_pasid().
>
> Yea, squashing this:
> @@ -3556,3 +3559,4 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
> for_each_group_device(group, device) {
> - if (device->dev->iommu->max_pasids > 0)
> + /* Device might be already detached for a device recovery */
> + if (!device->blocked && device->dev->iommu->max_pasids > 0)
> iommu_remove_dev_pasid(device->dev, pasid, domain);
>
> > another is a use-after-free concern upon iommu_detach_device() in
> > middle of reset. In my thinking it will trigger WARN_ON before any UAF:
> >
> > static void __iommu_group_set_domain_nofail(struct iommu_group *group,
> > struct iommu_domain *new_domain)
> > {
> > WARN_ON(__iommu_group_set_domain_internal(
> > group, new_domain, IOMMU_SET_DOMAIN_MUST_SUCCEED));
> > }
>
> Yes.
>
> > but I haven't got time to think about the fix carefully.
>
> I think we could squash this:
>
> @@ -2469,9 +2469,2 @@ static int __iommu_group_set_domain_internal(struct iommu_group *group,
>
> - /*
> - * This is a concurrent attach during device recovery. Reject it until
> - * pci_dev_reset_iommu_done() attaches the device to group->domain.
> - */
> - if (group->recovery_cnt)
> - return -EBUSY;
> -
On a second thought, we may not simply drop this -- IIRC, we added
it particularly to fence a case where gdevs share the same RID or
some corner case like that?
In a conservative way, we can still reject concurrent attach while
allowing the detach case:
+ /*
+ * This is a concurrent attach during device recovery. Reject it until
+ * pci_dev_reset_iommu_done() attaches the device to group->domain.
+ *
+ * Note: still allow MUST_SUCCEED callers (detach/teardown) through to
+ * avoid UAF on domain release paths.
+ */
+ if (group->recovery_cnt && !(flags & IOMMU_SET_DOMAIN_MUST_SUCCEED))
+ return -EBUSY;
+
In the detach path, it'll move forward and skip per gdev->blocked
inside the for_each_group_device() and defer the attach to done().
Thanks
Nicolin
> @@ -2484,2 +2477,10 @@ static int __iommu_group_set_domain_internal(struct iommu_group *group,
> for_each_group_device(group, gdev) {
> + /*
> + * Skip devices under recovery: they are already attached to
> + * group->blocking_domain at the hardware level. When their
> + * reset completes, pci_dev_reset_iommu_done() will re-attach
> + * them to the updated group->domain.
> + */
> + if (gdev->blocked)
> + continue;
> ret = __iommu_device_set_domain(group, gdev->dev, new_domain,
> @@ -2513,2 +2514,4 @@ static int __iommu_group_set_domain_internal(struct iommu_group *group,
> break;
> + if (gdev->blocked)
> + continue;
> /*
>
>
> > the last one is trivial that goto and guard() shouldn't be mixed in one
> > function according to the cleanup guidelines.
>
> I don't think this is mixing. The guard is protecting the entire
> routine including those goto paths. So there isn't any goto path
> that is outside the mutex.
>
> > Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>
>
> Thanks!
> Nicolin