Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

From: Andy Lutomirski
Date: Fri Sep 24 2021 - 19:04:25 EST

On Fri, Sep 24, 2021, at 9:12 AM, Luck, Tony wrote:
> On Fri, Sep 24, 2021 at 03:18:12PM +0200, Thomas Gleixner wrote:
>> On Thu, Sep 23 2021 at 19:48, Thomas Gleixner wrote:
>> > On Thu, Sep 23 2021 at 09:40, Tony Luck wrote:
>> >
>> > fpu_write_task_pasid() can just grab the pasid from current->mm->pasid
>> > and be done with it.
>> >
>> > The task exit code can just call iommu_pasid_put_task_ref() from the
>> > generic code and not from within x86.
>> But OTOH why do you need a per task reference count on the PASID at all?
>> The PASID is fundamentaly tied to the mm and the mm can't go away before
>> the threads have gone away unless this magically changed after I checked
>> that ~20 years ago.
> It would be possible to avoid a per-task reference to the PASID by
> taking an extra reference when mm->pasid is first allocated using
> the CONFIG_PASID_TASK_REFS you proposed yesterday to define a function
> to take the extra reference, and another to drop it when the mm is
> finally freed ... with stubs to do nothing on architectures that
> don't create per-task PASID context.
> This solution works, but is functionally different from Fenghua's
> proposal for this case:
> Process clones a task
> task binds a device
> task accesses device using PASID
> task unbinds device
> task exits (but process lives on)
> Fenghua will free the PASID because the reference count goes
> back to zero. The "take an extra reference and keep until the
> mm is freed" option would needlessly hold onto the PASID.
> This seems like an odd usage case ... even if it does exist, a process
> that does this may spawn another task that does the same thing.
> If you think it is sufficiently simpler to use the "extra reference"
> option (invoking the "perfect is the enemy of good enough" rule) then we
> can change.

I think the perfect and the good are a bit confused here. If we go for "good", then we have an mm owning a PASID for its entire lifetime. If we want "perfect", then we should actually do it right: teach the kernel to update an entire mm's PASID setting all at once. This isn't *that* hard -- it involves two things:

1. The context switch code needs to resync PASID. Unfortunately, this adds some overhead to every context switch, although a static_branch could minimize it for non-PASID users.
2. A change to an mm's PASID needs to sent an IPI, but that IPI can't touch FPU state. So instead the IPI should use task_work_add() to make sure PASID gets resynced.

And there is still no per-task refcounting.

After all, the not so perfect attempt at perfect in this patch set won't actually work if a thread pool is involved.