Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

From: Andy Lutomirski
Date: Wed Sep 29 2021 - 12:59:46 EST

On 9/29/21 05:28, Thomas Gleixner wrote:
On Wed, Sep 29 2021 at 11:54, Peter Zijlstra wrote:
On Fri, Sep 24, 2021 at 04:03:53PM -0700, Andy Lutomirski wrote:
I think the perfect and the good are a bit confused here. If we go for
"good", then we have an mm owning a PASID for its entire lifetime. If
we want "perfect", then we should actually do it right: teach the
kernel to update an entire mm's PASID setting all at once. This isn't
*that* hard -- it involves two things:

1. The context switch code needs to resync PASID. Unfortunately, this
adds some overhead to every context switch, although a static_branch
could minimize it for non-PASID users.

2. A change to an mm's PASID needs to sent an IPI, but that IPI can't
touch FPU state. So instead the IPI should use task_work_add() to
make sure PASID gets resynced.

What do we need 1 for? Any PASID change can be achieved using 2 no?

Basically, call task_work_add() on all relevant tasks [1], then IPI
spray the current running of those and presto.

[1] it is nigh on impossible to find all tasks sharing an mm in any sane
way due to CLONE_MM && !CLONE_THREAD.

Why would we want any of that at all?

Process starts, no PASID assigned.

bind to device -> PASID is allocated and assigned to the mm

some task of the process issues ENQCMD -> #GP -> write PASID MSR

After that the PASID is saved and restored as part of the XSTATE and
there is no extra overhead in context switch or return to user space.

All tasks of the process which did never use ENQCMD don't care and their
PASID xstate is in init state.

There is absolutely no point in enforcing that all tasks of the process
have the PASID activated immediately when it is assigned. If they need
it they get it via the #GP fixup and everything just works.

Looking at that patch again, none of this muck in fpu__pasid_write() is
required at all. The whole exception fixup is:

if (!user_mode(regs))
return false;

if (!current->mm->pasid)
return false;

if (current->pasid_activated)
return false;

<-- preemption or BH here: kaboom.

wrmsrl(MSR_IA32_PASID, current->mm->pasid);

This needs the actual sane fpstate writing helper -- see other email.