Re: [PATCH] x86, mem: move memmove to out of line assembler

From: Nick Desaulniers
Date: Fri Sep 23 2022 - 13:55:36 EST


Dropping Ma, emails bouncing.

On Fri, Sep 23, 2022 at 10:30 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Sep 23, 2022 at 10:02 AM Nick Desaulniers
> <ndesaulniers@xxxxxxxxxx> wrote:
> >
> > memmove is quite large and probably shouldn't be inlined due to size
> > alone. A noinline function attribute would be the simplest fix, but
> > there's a few things that stand out with the current definition:
>
> I don't think your patch is wrong, and it's not that different from
> what the x86-64 code did long ago back in 2011 in commit 9599ec0471de
> ("x86-64, mem: Convert memmove() to assembly file and fix return value
> bug")
>
> But I wonder if we might be better off getting rid of that horrid
> memmove thing entirely.

We could remove __HAVE_ARCH_MEMMOVE from
arch/x86/include/asm/string_32.h for ARCH=i386 then rip this
arch-specific definition of memmove out.

Might performance regressions be a concern with that approach?

I'll write up a patch for that just to have on hand, and leave the
decision up to someone else.

> The original thing seems to be from 2010,
> where commit 3b4b682becdf ("x86, mem: Optimize memmove for small size
> and unaligned cases") did this thing for both 32-bit and 64-bit code.
>
> And it's really not likely used all that much - memcpy is the *much*
> more important function, and that's the one that has actually been
> updated for modern CPU's instead of looking like it's some copy from
> glibc or whatever.

I suspect that's probably where the duplicate 3 label comes from:
likely some macros were expanded from another codebase's
implementation, then copied into the kernel sources.

>
> To make things even stranger, on the x86-64 side, we actually have
> *two* copies of 'memmove()' - it looks like memcpy_orig() is already a
> memmove, in that it does that
>
> cmp %dil, %sil
> jl .Lcopy_backward
>
> thing that seems to mean that it already handles the overlapping case.
>
> Anyway, that 32-bit memmove() implemented (badly) as inline asm does
> need to go. As you point out, it seems to get the clobbers wrong too,
> so it's actively buggy and just happens to work because there's
> nothing else in that function, and it clobbers %bx that is supposed to
> be callee-saved, but *presumably* the compile has to allocate %bx one
> of the other registers, so it will save and restore %bx anyway.
>
> So that current memmove() seems to be truly horrendously buggy, but in
> a "it happens to work" way. Your patch seems an improvement, but I get
> the feeling that it could be improved a lot more...
--
Thanks,
~Nick Desaulniers