Re: [RFC PATCH 10/10] x86/fpu: defer FPU state load until return to userspace
From: Andy Lutomirski
Date: Fri Sep 21 2018 - 00:15:42 EST
> On Sep 20, 2018, at 8:45 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
>
>
>> On Sep 19, 2018, at 10:05 AM, Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote:
>>
>> On 2018-09-12 08:47:19 [-0700], Andy Lutomirski wrote:
>>>> --- a/arch/x86/kernel/fpu/core.c
>>>> +++ b/arch/x86/kernel/fpu/core.c
>>>> @@ -101,14 +101,14 @@ void __kernel_fpu_begin(void)
>>>>
>>>> kernel_fpu_disable();
>>>>
>>>> - if (fpu->initialized) {
>>>> + __cpu_invalidate_fpregs_state();
>>>> +
>>>> + if (!test_and_set_thread_flag(TIF_LOAD_FPU)) {
>>>
>>> Since the already-TIF_LOAD_FPU path is supposed to be fast here, use test_thread_flag() instead. test_and_set operations do unconditional RMW operations and are always full barriers, so theyâre slow.
>> okay.
>>
>>> Also, on top of this patch, there should be lots of cleanups available. In particular, all the fpu state accessors could probably be reworked to take TIF_LOAD_FPU into account, which would simplify the callers and maybe even the mess of variables tracking whether the state is in regs.
>>
>> Do you refer to the fpu.initilized check or something else?
>>
>
> I mean the fpu.initialized variable entirely. AFAIK, its only use is for kernel threads â setting it to false lets us switch to a kernel thread and back without saving and restoring. But TIF_LOAD_FPU should be able to replace it: when we have FPU regs loaded and we switch to *any* thread, kernel or otherwise, we can set TIF_LOAD_FPU and leave the old regs loaded. So we donât need the special case for kernel threads.
>
> Which reminds me: if you havenât already done so, can you add a helper to sanity check the current context? It should check that the combination of owner_ctx, last_cpu, and TIF_LOAD_FPU is sane. For example, if owner_ctx or last_cpu is says the cpu regs are invalid for current but TIF_LOAD_FPU is clear, it should warn. I think that at least switch_fpu_finish should call it. Arguably switch_fpu_prepare should too, at the beginning.
Looking some more, the âpreloadâ variable needs to go away or be renamed. It hasnât had anything to do with preloading for some time.
Also, the interaction between TIF_LOAD_FPU and FPU emulation needs to be documented somewhere. Probably FPU-less systems should never have TIF_LOAD_FPU set.
Or we could decide that no one uses FPU emulation any more.