Re: [PATCH 1/7] x86/fpu: Simplify the fpu->last_cpu logic and rename it to fpu->fpregs_cached

From: Ingo Molnar
Date: Thu Jan 26 2017 - 09:54:16 EST



* Rik van Riel <riel@xxxxxxxxxx> wrote:

> On Thu, 2017-01-26 at 12:26 +0100, Ingo Molnar wrote:
> >
> > @@ -322,6 +308,16 @@ struct fpu {
> >   unsigned char fpregs_active;
> >  
> >   /*
> > +  * @fpregs_cached:
> > +  *
> > +  * This flag tells us whether this context is loaded into a
> > CPU
> > +  * right now.
>
> Not quite. You are still checking against fpu_fpregs_owner_ctx.

> How about something like
>
> * This flag tells us whether this context was loaded into
> * its current CPU; fpu_fpregs_owner_ctx will tell us whether
> * this context is actually in the registers.

That's still not quite accurate: if ->fpregs_cached is 0 and fpu_fpregs_owner_ctx
is still pointing to the FPU structure then the context is not actually in the
registers anymore - it's a stale copy of some past version.

These values simply tell us whether an in-memory FPU context's latest version is
in CPU registers or not: both have to be valid for the in-CPU registers to be
valid and current. The fpu_fpregs_owner_ctx pointer is a per-CPU data structure
that tells us this fact, the ->fpregs_cached flag tells us the same - but it is
placed into the task/fpu structure.

Clearing any of those values invalidates the cache and the point of keeping them
split is implementation efficiency: for some invalidations it's easier to use the
per-cpu structure, for some others (such as ptrace access) it's easier to access
the per-task flag. The FPU switch-in code has easy access to both values so
there's no extra cost from having the cache validity flag split into two parts.

A consequence of this is that a correct implementation could in theory eliminate
any of the two flags:

- We could use only fpu_fpregs_owner_ctx and remove ->fpregs_cached, in this case
the ptrace codepaths would have to invalidate the fpu_fpregs_owner_ctx pointer
which requires some care as it's not just a local CPU modification, i.e. a
single cmpxchg() would be required to invalidate the register state.

- Or we could use only ->fpregs_cached and eliminate fpu_fpregs_owner_ctx: this
would be awkward from the kernel_fpu_begin()/end() API codepaths, which has no
easy access to the task that has its FPU context cached in the CPU registers.
(Which might not be the current task.)

So I think the best implementation is to have both flags, and to use the one that
is the most efficient to access to drive the invalidations from.

What we could do is to unify the naming to explain all this a bit better - right
now there's very little indication that ->fpregs_cached is closely related to
fpu_fpregs_owner_ctx.

For example we could rename them to:

->fpregs_cached => ->fpregs_owner [bool]
fpu_fpregs_owner_ctx => fpregs_owner_ctx [ptr]

?

Clearing ->fpregs_owner or setting fpregs_owner_ctx to NULL invalidates the cache
and it's clear from the naming that the two values are closely related.

Would this work with you?

Thanks,

Ingo