Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

From: Luck, Tony
Date: Tue Sep 28 2021 - 16:28:29 EST


On Tue, Sep 28, 2021 at 12:19:22PM -0700, Dave Hansen wrote:
> On 9/28/21 11:50 AM, Luck, Tony wrote:
> > On Mon, Sep 27, 2021 at 04:51:25PM -0700, Dave Hansen wrote:
> ...
> >> 1. Hide whether we need to write to real registers
> >> 2. Hide whether we need to update the in-memory image
> >> 3. Hide other FPU infrastructure like the TIF flag.
> >> 4. Make the users deal with a *whole* state in the replace API
> >
> > Is that difference just whether you need to save the
> > state from registers to memory (for the "update" case)
> > or not (for the "replace" case ... where you can ignore
> > the current register, overwrite the whole per-feature
> > xsave area and mark it to be restored to registers).
> >
> > If so, just a "bool full" argument might do the trick?
>
> I want to be able to hide the complexity of where the old state comes
> from. It might be in registers or it might be in memory or it might be
> *neither*. It's possible we're running with stale register state and a
> current->...->xsave buffer that has XFEATURES&XFEATURE_FOO 0.
>
> In that case, the "old" copy might be memcpy'd out of the init_task.
> Or, for pkeys, we might build it ourselves with init_pkru_val.

So should there be an error case if there isn't an "old" state, and
the user calls:

p = begin_update_one_xsave_feature(XFEATURE_something, false);

Maybe instead of an error, just fill it in with the init state for the feature?

> > Also - you have a "tsk" argument in your pseudo code. Is
> > this needed? Are there places where we need to perform
> > these operations on something other than "current"?
>
> Two cases come to mind:
> 1. Fork/clone where we are doing things to our child's XSAVE buffer
> 2. ptrace() where we are poking into another task's state
>
> ptrace() goes for the *whole* buffer now. I'm not sure it would need
> this per-feature API. I just call it out as something that we might
> need in the future.

Ok - those seem ok ... it is up to the caller to make sure that the
target task is in some "not running, and can't suddenly start running"
state before calling these functions.

>
> > pseudo-code:
> >
> > void *begin_update_one_xsave_feature(enum xfeature xfeature, bool full)
> > {
> > void *addr;
> >
> > BUG_ON(!(xsave->header.xcomp_bv & xfeature));
> >
> > addr = __raw_xsave_addr(xsave, xfeature);
> >
> > fpregs_lock();
> >
> > if (full)
> > return addr;
>
> If the feature is marked as in the init state in the buffer
> (XSTATE_BV[feature]==0), this addr *could* contain total garbage. So,
> we'd want to make sure that the memory contents have the init state
> written before handing them back to the caller. That's not strictly
> required if the user is writing the whole thing, but it's the nice thing
> to do.

Nice guys waste CPU cycles writing to memory that is just going to get
written again.

>
> > if (xfeature registers are "live")
> > xsaves(xstate, 1 << xfeature);
>
> One little note: I don't think we would necessarily need to do an XSAVES
> here. For PKRU, for instance, we could just do a rdpkru.

Like this?

if (tsk == current) {
switch (xfeature) {
case XFEATURE_PKRU:
*(u32 *)addr = rdpkru();
break;
case XFEATURE_PASID:
rdmsrl(MSR_IA32_PASID, msr);
*(u64 *)addr = msr;
break;
... any other "easy" states ...
default:
xsaves(xstate, 1 << xfeature);
break;
}
}

>
> > return addr;
> > }
> >
> > void finish_update_one_xsave_feature(enum xfeature xfeature)
> > {
> > mark feature modified
>
> I think we'd want to do this at the "begin" time. Also, do you mean we
> should set XSTATE_BV[feature]?

Begin? End? It's all inside fpregs_lock(). But whatever seems best.

Yes, I think that this means set XSTATE_BV[feature] ... but I'm
relying on you as the xsave expert to help get the subtle bits right so
the Andy Lutomirski can smile at this code.

> > set TIF bit
>
> Since the XSAVE buffer was updated, it now contains the canonical FPU
> state. It may have diverged from the register state, thus we need to
> set TIF_NEED_FPU_LOAD.

Yes, that's the TIF bit my pseudo-code intended.

> It's also worth noting that we *could*:
>
> xrstors(xstate, 1<<xfeature);
>
> as well. That would bring the registers back up to day and we could
> keep TIF_NEED_FPU_LOAD==0.

Only makes sense if "tsk == current". But does this help. The work seems
to be the same whether we do it now, or later. We don't know for sure
that we will directly return to the task. We might context switch to
another task, so loading the state into registers now would just be
wasted time.

>
> > fpregs_unlock();
> > }

-Tony