Re: [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx

From: Borislav Petkov
Date: Wed Nov 27 2019 - 09:08:09 EST


On Wed, Nov 27, 2019 at 01:42:43PM +0100, Sebastian Andrzej Siewior wrote:
> The state/owner of FPU is saved fpu_fpregs_owner_ctx by pointing to the
^
to

> context that is currently loaded. It never changed during the life time
> of a task and remained stable/constant.
>
> Since we deferred loading the FPU registers on return to userland, the

Drop those "we"s :)

> content of fpu_fpregs_owner_ctx may change during preemption and must
> not be cached.
> This went unnoticed for some time and was now noticed, in particular
> gcc-9 is able to cache that load in copy_fpstate_to_sigframe() and reuse
> it in the retry loop:
>
> copy_fpstate_to_sigframe()
> load fpu_fpregs_owner_ctx and save on stack
> fpregs_lock()
> copy_fpregs_to_sigframe() /* failed */
> fpregs_unlock()
> *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***
>
> fault_in_pages_writeable() /* succeed, retry */
>
> fpregs_lock()
> __fpregs_load_activate()
> fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
> copy_fpregs_to_sigframe() /* succeeds, random FPU content */
>
> This is a comparison of the assembly of gcc-9, without vs with this
> patch:
>
> | # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size))
> | cmpq %rdx, %rax # tmp183, _4
> | jb .L190 #,
> |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |-#APP
> |-# 512 "arch/x86/include/asm/fpu/internal.h" 1
> |- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__
> |-# 0 "" 2
> |-#NO_APP
> |- movq %rax, -88(%rbp) # pfo_ret__, %sfp
> â
> |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |- movq -88(%rbp), %rcx # %sfp, pfo_ret__
> |- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp
> |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |+#APP
> |+# 512 "arch/x86/include/asm/fpu/internal.h" 1
> |+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__
> |+# 0 "" 2
> |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> |+#NO_APP
> |+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp
>
> Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
> fpu_fpregs_owner_ctx during preemption points.
>
> Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace")

Or

a352a3b7b792 ("x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD")

maybe, which adds the fpregs_unlock() ?

> ---
>
> There is no Sign-off by here. Could this please be verified by the
> reporter?

Not the reporter, but I just tested it successfully too:

Tested-by: Borislav Petkov <bp@xxxxxxx>

> Also I would like to add
> Debugged-by: Ian Lance Taylor

Yes, pls. CCed.

>
> but I lack the complete address also I'm not sure if he wants to.
> Also please send a Reported-by line since I'm not sure who started this.
>
> arch/x86/include/asm/fpu/internal.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 4c95c365058aa..44c48e34d7994 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
>
> static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
> {
> - return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> + return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
> }

And to add one more data point from IRC: this is also documented:

/*
* this_cpu_read() makes gcc load the percpu variable every time it is
* accessed while this_cpu_read_stable() allows the value to be cached.
^^^^^^^^^^^^^^^

* this_cpu_read_stable() is more efficient and can be used if its value
* is guaranteed to be valid across cpus. The current users include
* get_current() and get_thread_info() both of which are actually
* per-thread variables implemented as per-cpu variables and thus
* stable for the duration of the respective task.
*/
#define this_cpu_read_stable(var) percpu_stable_op("mov", var)


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette