Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

From: Zhangfei Gao
Date: Tue Apr 26 2022 - 00:28:26 EST


Hi, Jean

On 2022/4/26 上午12:13, Jean-Philippe Brucker wrote:
Hi Jacob,

On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote:
Hi Jean-Philippe,

On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker
<jean-philippe@xxxxxxxxxx> wrote:

On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:
On 4/25/22 06:53, Jean-Philippe Brucker wrote:
On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei.gao@xxxxxxxxxxx
wrote:
On 5.17
fops_release is called automatically, as well as
iommu_sva_unbind_device. On 5.18-rc1.
fops_release is not called, have to manually call close(fd)
Right that's weird
Looks it is caused by the fix patch, via mmget, which may add
refcount of fd.
Yes indirectly I think: when the process mmaps the queue,
mmap_region() takes a reference to the uacce fd. That reference is
released either by explicit close() or munmap(), or by exit_mmap()
(which is triggered by mmput()). Since there is an mm->fd dependency,
we cannot add a fd->mm dependency, so no mmget()/mmput() in
bind()/unbind().

I guess we should go back to refcounted PASIDs instead, to avoid
freeing them until unbind().
Yeah, this is a bit gnarly for -rc4. Let's just make sure there's
nothing else simple we can do.

How does the IOMMU hardware know that all activity to a given PASID is
finished? That activity should, today, be independent of an mm or a
fd's lifetime.
In the case of uacce, it's tied to the fd lifetime: opening an accelerator
queue calls iommu_sva_bind_device(), which sets up the PASID context in
the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
destroys the PASID context (after the device driver stopped all DMA for
this PASID).

For VT-d, it is essentially the same flow except managed by the individual
drivers such as DSA.
If free() happens before unbind(), we deactivate the PASIDs and suppress
faults from the device. When the unbind finally comes, we finalize the
PASID teardown. It seems we have a need for an intermediate state where
PASID is "pending free"?
Yes we do have that state, though I'm not sure we need to make it explicit
in the ioasid allocator.

Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()? For Arm
we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
also part of Lu's rework [1].

Move mm_pasid_drop to __mmdrop looks workable.

The nginx works since ioasid is not freed when master exit until nginx stop.

The ioasid does not free immediately when fops_release->unbind finished.
Instead, __mmdrop happens a bit lazy,  which has no issue though
I passed 10000 times exit without unbind test, the pasid allocation is ok.

Thanks


diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..60f417f69367 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -792,6 +792,8 @@ void __mmdrop(struct mm_struct *mm)
        mmu_notifier_subscriptions_destroy(mm);
        check_mm(mm);
        put_user_ns(mm->user_ns);
+       mm_pasid_drop(mm);
        free_mm(mm);
 }
 EXPORT_SYMBOL_GPL(__mmdrop);
@@ -1190,7 +1192,6 @@ static inline void __mmput(struct mm_struct *mm)
        }
        if (mm->binfmt)
                module_put(mm->binfmt->module);
-       mm_pasid_drop(mm);
        mmdrop(mm);
 }


Thanks,
Jean

[1] https://lore.kernel.org/linux-iommu/20220421052121.3464100-9-baolu.lu@xxxxxxxxxxxxxxx/