RE: [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
From: Tian, Kevin
Date: Tue Mar 31 2026 - 03:21:26 EST
> From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Sent: Saturday, March 28, 2026 5:09 AM
>
> On Fri, Mar 27, 2026 at 08:27:12AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> > > Sent: Friday, March 20, 2026 5:35 AM
> > >
> > > On Thu, Mar 19, 2026 at 07:14:21PM +0800, Shuai Xue wrote:
> > > > On 3/19/26 12:31 PM, Nicolin Chen wrote:
> > > > > @@ -3961,9 +3962,10 @@ int pci_dev_reset_iommu_prepare(struct
> > > pci_dev *pdev)
> > > > > guard(mutex)(&group->mutex);
> > > > > - /* Re-entry is not allowed */
> > > > > - if (WARN_ON(group->resetting_domain))
> > > > > - return -EBUSY;
> > > > > + if (group->resetting_domain) {
> > > > > + group->reset_cnt++;
> > > > > + return 0;
> > > > > + }
> > > > > ret = __iommu_group_alloc_blocking_domain(group);
> > > >
> > > > pci_dev_reset_iommu_prepare/done() have NO singleton group check.
> > > > They operate on the specific pdev passed in, but use group-wide
> > > > state (resetting_domain, reset_cnt) to track the reset lifecycle.
> > > >
> > > > Interestingly, the broken_worker in patch 3 of the ATC timeout
> > > > series DOES have an explicit singleton check:
> > > >
> > > > if (list_is_singular(&group->devices)) {
> > > > /* Note: only support group with a single device */
> > > >
> > > > This reveals an implicit assumption: the entire prepare/done
> > > > mechanism works correctly only for singleton groups. For
> > > > multi-device groups:
> > > >
> > > > - prepare() only detaches the specific pdev, leaving other
> > > > devices in the group still attached to the original domain
> >
> > no, attach is per-group. all devices are changed together.
>
> I think prepare() should use the per-device ops:
> ops->attach_dev (via __iommu_attach_device)
> ops->set_dev_pasid
> right?
it's per-device op but the condition of calling it is per-group:
if (group->domain != group->blocking_domain) {
ret = __iommu_attach_device(group->blocking_domain, &pdev->dev,
group->domain);
if (ret)
return ret;
}
>
> And it is intended to limit to the resetting device only. Other
> devices in the group can still stay in the attached domain, but
> they cannot do new attachment (-EBUSY) because that's per-group.
how could other devices stay in the attached domain?
>
> > > > - The group-wide resetting_domain/reset_cnt state can be
> > > > corrupted by concurrent resets on different devices (as
> > > > discussed above)
> > >
> > > That's a phenomenal catch!
> > >
> > > I think we shall have a reset_ndevs in the group and a reset_depth
> > > in the gdev.
> > >
> >
> > I didn't get how the state might be corrupted.
> >
> > If there are two devices in a group both being reset, you only
> > want to switch back to the original domain after both devices
> > have finished reset.
> >
> > so a reset_cnt is sufficient to cover both scenarios:
> >
> > - nested prepare/done in a single device resetting path
> > - concurrent prepare/done in multiple devices resetting paths
>
> Since prepare() is per-device, the per-group reset_cnt wouldn't
> be sufficient if one device is resetting while the other isn't.
>
> iommu_driver_get_domain_for_dev() for example would return the
> blocking_domain for both devices, but physically only the first
> device is attached to blocking_domain.
what do you mean by 'physically'?
>
> > v4 is way too complex. Probably it's required in the following
> > ATS timeout series (which I haven't thought about carefully).
> > but for the fix here seems a simple logic like v2 does is ok?
>
> I think v4 is the way to go, unless we fundamentally limit this
> API (and the followup ATS timeout series) to singleton groups.
currently we don't have any per-device resetting logic. As a rc
fix I didn't see the point of making it complex.
>
> FWIW, I have the ATS timeout series updated (waiting for some
> finalization of this base patch):
> https://github.com/nicolinc/iommufd/commits/smmuv3_atc_timeout-v3/
> The new iommu_report_device_broken() is changed to per-gdev.
>