Re: [PATCH] x86: add back the alignment of the destination to 8 bytes in copy_user_generic()
From: Linus Torvalds
Date: Fri Mar 14 2025 - 15:06:56 EST
On Fri, 14 Mar 2025 at 07:53, Herton R. Krzesinski <herton@xxxxxxxxxx> wrote:
>
> --- a/arch/x86/include/asm/uaccess_64.h
> +++ b/arch/x86/include/asm/uaccess_64.h
> @@ -130,7 +130,7 @@ copy_user_generic(void *to, const void *from, unsigned long len)
> "2:\n"
> _ASM_EXTABLE_UA(1b, 2b)
> :"+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
> - : : "memory", "rax");
> + : : "memory", "rax", "rdx", "r8");
Please don't penalize the caller with the extra clobbers.
Maybe it doesn't matter - these functions are marked always_inline,
but they aren't inlined in very many places and maybe those places
have registers to spare - but let's not penalize the FSRM case anyway.
And we do call it "rep_movs_alternative", so let's keep it close to
"rep movs" semantics (yes, we already clobber %rax, but let's not make
it worse).
As to the actual change to rep_movs - that should be done differently
too. In particular, I doubt it makes any sense to try to align the
destination for small writes or for the ERMS case when we use 'rep
movsb', so I think this should all go into just the ".Llarge_movsq"
case.
.. and then the patch can be further optimized to just do the first -
possibly unaligned - destination word unconditionally, and then
updating the addresses and counts to make the rest be aligned.
Something ENTIRELY UNTESTED like this, in other words. And I wrote it
so that it doesn't need any new temporary registers, so no need for
clobbers or for some save/restore code.
NOTE! The patch below is very intentionally whitespace-damaged.
Anybody who applies this needs to look at it very carefully, because I
just threw this together with zero testing and only very limited
thought.
But if it works, and if it actually improves performance, I think it
might be a fairly minimal approach to "align destination".
Linus
----
arch/x86/lib/copy_user_64.S | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index fc9fb5d06174..1c3090af3807 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -74,6 +74,23 @@ SYM_FUNC_START(rep_movs_alternative)
_ASM_EXTABLE_UA( 0b, 1b)
.Llarge_movsq:
+ /* Do the first possibly unaligned word */
+0: movq (%rsi),%rax
+1: movq %rax,(%rdi)
+ _ASM_EXTABLE_UA( 0b, .Lcopy_user_tail)
+ _ASM_EXTABLE_UA( 1b, .Lcopy_user_tail)
+
+ /* What would be the offset to the aligned destination? */
+ leaq 8(%rdi),%rax
+ andq $-8,%rax
+ subq %rdi,%rax
+
+ /* .. and update pointers and count to match */
+ addq %rax,%rdi
+ addq %rax,%rsi
+ subq %rax,%rcx
+
+ /* make %rcx contain the number of words, %rax the remainder */
movq %rcx,%rax
shrq $3,%rcx
andl $7,%eax