Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency

From: Greg KH
Date: Tue Feb 15 2022 - 11:20:22 EST


On Tue, Feb 15, 2022 at 07:36:44AM -0800, Brian Geffon wrote:
> There are two issues with PKRU handling prior to 5.13. The first is that
> when eagerly switching PKRU we check that current is not a kernel
> thread as kernel threads will never use PKRU. It's possible that
> this_cpu_read_stable() on current_task (ie. get_current()) is returning
> an old cached value. By forcing the read with this_cpu_read() the
> correct task is used. Without this it's possible when switching from
> a kernel thread to a userspace thread that we'll still observe the
> PF_KTHREAD flag and never restore the PKRU. And as a result this
> issue only occurs when switching from a kernel thread to a userspace
> thread, switching from a non kernel thread works perfectly fine because
> all we consider in that situation is the flags from some other non
> kernel task and the next fpu is passed in to switch_fpu_finish().
>
> Without reloading the value finish_fpu_load() after being inlined into
> __switch_to() uses a stale value of current:
>
> ba1: 8b 35 00 00 00 00 mov 0x0(%rip),%esi
> ba7: f0 41 80 4d 01 40 lock orb $0x40,0x1(%r13)
> bad: e9 00 00 00 00 jmp bb2 <__switch_to+0x1eb>
> bb2: 41 f6 45 3e 20 testb $0x20,0x3e(%r13)
> bb7: 75 1c jne bd5 <__switch_to+0x20e>
>
> By using this_cpu_read() and avoiding the cached value the compiler does
> insert an additional load instruction and observes the correct value now:
>
> ba1: 8b 35 00 00 00 00 mov 0x0(%rip),%esi
> ba7: f0 41 80 4d 01 40 lock orb $0x40,0x1(%r13)
> bad: e9 00 00 00 00 jmp bb2 <__switch_to+0x1eb>
> bb2: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax
> bb9: 00
> bba: f6 40 3e 20 testb $0x20,0x3e(%rax)
> bbe: 75 1c jne bdc <__switch_to+0x215>
>
> The second issue is when using write_pkru() we only write to the
> xstate when the feature bit is set because get_xsave_addr() returns
> NULL when the feature bit is not set. This is problematic as the CPU
> is free to clear the feature bit when it observes the xstate in the
> init state, this behavior seems to be documented a few places throughout
> the kernel. If the bit was cleared then in write_pkru() we would happily
> write to PKRU without ever updating the xstate, and the FPU restore on
> return to userspace would load the old value agian.
>
> Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> Signed-off-by: Brian Geffon <bgeffon@xxxxxxxxxx>
> Signed-off-by: Willis Kung <williskung@xxxxxxxxxx>
> Tested-by: Willis Kung <williskung@xxxxxxxxxx>
> ---
> arch/x86/include/asm/fpu/internal.h | 2 +-
> arch/x86/include/asm/pgtable.h | 14 ++++++++++----
> 2 files changed, 11 insertions(+), 5 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>