Re: [PATCH] x86/lib: don't use MMX before FPU initialization

From: Krzysztof Mazur
Date: Thu Jan 14 2021 - 19:06:40 EST


On Thu, Jan 14, 2021 at 08:31:30AM -0800, Andy Lutomirski wrote:
> This is gross. I realize this is only used for old CPUs that we don't
> care about perf-wise

Performance might be still important for embedded systems (Geode LX
seems to be supported "until at least 2021").

> , but this code is nonsense -- it makes absolutely
> to sense to put this absurd condition here to work around
> kernel_fpu_begin() bugs.

Yes, it's nonsense. But I think that the kernel might have a silent
assumption, that the FPU cannot be used too early.

> If we want to use MMX, we should check MMX.
> And we should also check the correct condition re: irqs. So this code
> should be:
>
> if (boot_cpu_has(X86_FEATURE_XMM) && irq_fpu_usable()) {
> kernel_fpu_begin_mask(FPU_MASK_XMM);
> ...

This code does not use SSE (XMM). It uses only MMX and 3DNow!, so XMM
check is not needed here. And this code is enabled only when a processor
with 3DNow! is selected ("lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o"
in Makefile). So:

if (!irq_fpu_usable())
fallback_to_non_mmx_code();

is sufficient.

But the original:

if (!in_interrupt())
fallback_to_non_mmx_code();

is also valid. Changing it to irq_fpu_usable() might allow using
MMX variant in more cases and even improve performance. But it
can also introduce some regressions.

> or we could improve code gen by adding a try_kernel_fpu_begin_mask()
> so we can do a single call instead of two.
> [...]
> What do you all think? If you're generally in favor, I can write the
> code and do a quick audit for other weird cases in the kernel.

I think that the cleanest approach would be introducing fpu_usable(),
which returns 0 (or false) if FPU is not initialized yet. Then
irq_fpu_usable() could be changed to:

bool irq_fpu_usable(void)
{
return fpu_usable() && (!in_interrupt() ||
interrupted_user_mode() ||
interrupted_kernel_fpu_idle());
}

and in the _mmx_memcpy():

- if (unlikely(in_interrupt()))
+ if (unlikely(irq_fpu_usable()))
return __memcpy(to, from, len);


try_kernel_fpu_begin(), even without mask, is also a good idea.
If the FPU is not available yet (FPU not initizalized yet) it can fail
and a fallback code would be used. However, in some cases fpu_usable()
+ kernel_fpu_begin() might be better, if between fpu_usable() check
and kernel_fpu_begin() long preparation is required. (kernel_fpu_begin()
disables preemption). In the mmx_32.c case try_kernel_fpu_begin()
looks good, only two simple lines are added to a section with disabled
preemption:

diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..9c6dadd2b2ef 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -31,14 +31,12 @@ void *_mmx_memcpy(void *to, const void *from, size_t len)
void *p;
int i;

- if (unlikely(in_interrupt()))
+ if (unlikely(try_kernel_fpu_begin()))
return __memcpy(to, from, len);

p = to;
i = len >> 6; /* len/64 */

- kernel_fpu_begin();
-
__asm__ __volatile__ (
"1: prefetch (%0)\n" /* This set is 28 bytes */
" prefetch 64(%0)\n"



And if try_kernel_fpu_begin_mask() with a mask will allow for some
optimizations it also might be a good idea.

> Or we could flip the OSFSXR bit very early, I suppose.

I think that my original static_branch_likely() workaround might
be the simplest and touches only mmx_32.c. Such approach is already
used in the kernel, for instance in the crypto code:

$ git grep static_branch arch/x86/crypto/


And maybe using FPU too early should be forbidden.

Best regards,
Krzysiek