Re: [PATCH 2/2] x86/fpu: split old & new fpu handling into separate functions

From: Dave Hansen
Date: Fri Oct 14 2016 - 13:15:44 EST


On 10/14/2016 05:15 AM, riel@xxxxxxxxxx wrote:
> From: Rik van Riel <riel@xxxxxxxxxx>
>
> By moving all of the new fpu state handling into switch_fpu_finish,
> the code can be simplified some more. This does get rid of the
> prefetch, but given the size of the fpu register state on modern
> CPUs, and the amount of work done by __switch_to in-between both
> functions, the value of a single cache line prefetch seems somewhat
> dubious anyway.
...
> -
> - if (fpu.preload) {
> - if (fpregs_state_valid(new_fpu, cpu))
> - fpu.preload = 0;
> - else
> - prefetch(&new_fpu->state);
> - fpregs_activate(new_fpu);
> - }
> -
> - return fpu;
> }

Yeah, that prefetch is highly dubious. XRSTOR might not even be
_reading_ that cacheline if the state isn't present (xstate->xfeatures
bit is 0). If we had to pick *a* cacheline to prefetch for XRSTOR, it
would be the XSAVE header, *not* the FPU state.

I actually did some attempts to optimize the PKRU handling by touching
and prefetching the state before calling XRSTOR. It actually made
things overall _worse_ when I touched it before the XRSTOR.

It would be ideal to have some data on whether this actually _does_
anything, but I can't imagine it being a real delta in either direction.

Acked-by: Dave Hansen <dave.hansen@xxxxxxxxx>