Re: x86 copy performance regression

From: Eric Dumazet
Date: Fri May 26 2023 - 12:37:29 EST


On Fri, May 26, 2023 at 6:30 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, May 26, 2023 at 8:00 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > We can see rep_movs_alternative() using more cycles in kernel profiles
> > than the previous variant (copy_user_enhanced_fast_string, which was
> > simply using "rep movsb"), and we can not reach line rate (as we
> > could before the series)
>
> Hmm. I assume the attached patch ends up fixing the regression?
>
> That hack to generate the two-byte 'jae' instruction even for the
> alternative is admittedly not pretty, but I just couldn't deal with
> the alternative that generated pointlessly bad code.
>
> We could make the constant in the comparison depend on whether it is
> for the unrolled or for the erms case too, I guess, but I think erms
> is probably "good enough" with 64-byte copies.
>
> I was really hoping we could avoid this, but hey, a regression is a regression.
>
> Can you verify this patch fixes things for you?


Hmm.. my build environment does not like this yet :)

arch/x86/lib/copy_user_64.S:40:30: error: unexpected token in argument list
0: alternative ".byte 0x73," ".Lunrolled" "-0b-2", ".byte 0x73,"
".Llarge" "-0b-2", X86_FEATURE_ERMS
^
make[3]: *** [scripts/Makefile.build:374: arch/x86/lib/copy_user_64.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [scripts/Makefile.build:494: arch/x86/lib] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:2026: .] Error 2