RE: [PATCH v5 5/6] iommu: Support mm PASID 1:n with sva domains

From: Zhang, Tina
Date: Wed Oct 11 2023 - 03:09:25 EST


Hi Nicolin,

The v6 version has submitted: https://lore.kernel.org/linux-iommu/20231011065132.102676-1-tina.zhang@xxxxxxxxx/

In v6 version, we make the iommu_sva_lock holding as a precondition for iommu_alloc_mm_data(). Although I think the issue you met probably wasn't caused by the iommu_sva_lock holding logic in iommu_sva_bind_device() of v5 patch-set. I guess it's worth a try.

Regards,
-Tina

> -----Original Message-----
> From: Zhang, Tina
> Sent: Tuesday, October 10, 2023 7:22 PM
> To: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Cc: Jason Gunthorpe <jgg@xxxxxxxx>; Tian, Kevin <kevin.tian@xxxxxxxxx>; Lu
> Baolu <baolu.lu@xxxxxxxxxxxxxxx>; Michael Shavit <mshavit@xxxxxxxxxx>;
> Vasant Hegde <vasant.hegde@xxxxxxx>; iommu@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v5 5/6] iommu: Support mm PASID 1:n with sva domains
>
> Hi Nicolin,
>
> On 10/6/23 16:07, Nicolin Chen wrote:
> > Hi Tina,
> >
> > On Sun, Sep 24, 2023 at 07:38:12PM -0700, Tina Zhang wrote:
> >
> >> Each mm bound to devices gets a PASID and corresponding sva domains
> >> allocated in iommu_sva_bind_device(), which are referenced by
> >> iommu_mm field of the mm. The PASID is released in __mmdrop(), while
> >> a sva domain is released when no one is using it (the reference count
> >> is decremented in iommu_sva_unbind_device()). However, although sva
> >> domains and their PASID are separate objects such that their own life
> >> cycles could be handled independently, an enqcmd use case may require
> >> releasing the PASID in releasing the mm (i.e., once a PASID is
> >> allocated for a mm, it will be permanently used by the mm and won't
> >> be released until the end of mm) and only allows to drop the PASID
> >> after the sva domains are released. To this end, mmgrab() is called
> >> in iommu_sva_domain_alloc() to increment the mm reference count and
> >> mmdrop() is invoked in
> >> iommu_domain_free() to decrement the mm reference count.
> >>
> >> Since the required info of PASID and sva domains is kept in struct
> >> iommu_mm_data of a mm, use mm->iommu_mm field instead of the old
> >> pasid field in mm struct. The sva domain list is protected by
> iommu_sva_lock.
> >>
> >> Besides, this patch removes mm_pasid_init(), as with the introduced
> >> iommu_mm structure, initializing mm pasid in mm_init() is unnecessary.
> >>
> >> Reviewed-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> >> Reviewed-by: Vasant Hegde <vasant.hegde@xxxxxxx>
> >> Signed-off-by: Tina Zhang <tina.zhang@xxxxxxxxx>
> >
> >> @@ -128,8 +142,9 @@ void iommu_sva_unbind_device(struct iommu_sva
> *handle)
> >> struct device *dev = handle->dev;
> >>
> >> mutex_lock(&iommu_sva_lock);
> >> + iommu_detach_device_pasid(domain, dev, pasid);
> >> if (--domain->users == 0) {
> >> - iommu_detach_device_pasid(domain, dev, pasid);
> >> + list_del(&domain->next);
> >> iommu_domain_free(domain);
> >> }
> >> mutex_unlock(&iommu_sva_lock); @@ -209,4 +224,5 @@ void
> >> mm_pasid_drop(struct mm_struct *mm)
> >> return;
> >>
> >> iommu_free_global_pasid(mm_get_pasid(mm));
> >> + kfree(mm->iommu_mm);
> >
> > I ran some SVA tests by applying this series on top of my local
> > SMMUv3 tree, though it is not exactly a vanilla mainline tree.
> > And I see a WARN_ON introduced by this patch (did git-bisect):
> >
> > [ 364.237319] ------------[ cut here ]------------ [ 364.237328]
> > ida_free called for id=12 which is not allocated.
> > [ 364.237346] WARNING: CPU: 2 PID: 11003 at lib/idr.c:525
> > ida_free+0x10c/0x1d0 ....
> > [ 364.237415] pc : ida_free+0x10c/0x1d0 [ 364.237416] lr :
> > ida_free+0x10c/0x1d0 ....
> > [ 364.237439] Call trace:
> > [ 364.237440] ida_free+0x10c/0x1d0
> > [ 364.237442] iommu_free_global_pasid+0x30/0x50 [ 364.237449]
> > mm_pasid_drop+0x44/0x70 [ 364.237452] __mmdrop+0xf4/0x210 [
> > 364.237457] finish_task_switch.isra.0+0x238/0x2e8
> > [ 364.237460] schedule_tail+0x1c/0x1b8 [ 364.237462]
> > ret_from_fork+0x4/0x20 [ 364.237466] irq event stamp: 0 [
> > 364.237467] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [
> > 364.237470] hardirqs last disabled at (0): [<ffffc0c16022e558>]
> > copy_process+0x770/0x1c78 [ 364.237473] softirqs last enabled at
> > (0): [<ffffc0c16022e558>] copy_process+0x770/0x1c78 [ 364.237475]
> > softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 364.237476]
> > ---[ end trace 0000000000000000 ]---
> >
> > I haven't traced it closely to see what's wrong, due to some other
> > tasks. Yet, if you have some idea about this or something that you
> > want me to try, let me know.
> Thanks for reporting this issue. I did some sva tests, but didn't run into this
> issue. I'm going to try more cases and let you know if I can find anything
> interesting.
>
>
> Regards,
> -Tina
>
> >
> > Thanks
> > Nicolin