Re: [PATCH] x86: handle the tail in rep_movs_alternative() with an overlapping store
From: Linus Torvalds
Date: Thu Mar 20 2025 - 15:24:17 EST
On Thu, 20 Mar 2025 at 12:06, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> Sizes ranged <8,64> are copied 8 bytes at a time with a jump out to a
> 1 byte at a time loop to handle the tail.
I definitely do not mind this patch, but I think it doesn't go far enough.
It gets rid of the byte-at-a-time loop at the end, but only for the
short-copy case of 8-63 bytes.
The .Llarge_movsq ends up still doing
testl %ecx,%ecx
jne .Lcopy_user_tail
RET
and while that is only triggered by the non-ERMS case, that's what
most older AMD CPU's will trigger, afaik.
So I think that if we do this, we should do it properly.
Linus