Re: x86 copy performance regression

From: Linus Torvalds
Date: Fri May 26 2023 - 13:17:38 EST


On Fri, May 26, 2023 at 10:00 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Let me go look at it some more. I *really* didn't want to make the
> code worse for ERMS

Oh well. I'll think about it some more in the hope that I can come up
with something clever that doesn't make objtool hate me, but in the
meantime let me just give you the "not clever" patch.

It generates an annoying six-byte jump when the small 2-byte one would
work just fine, but I guess only my pride is wounded.

Linus
arch/x86/lib/copy_user_64.S | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 4fc5c2de2de4..7e972224b0ba 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -7,6 +7,8 @@
*/

#include <linux/linkage.h>
+#include <asm/cpufeatures.h>
+#include <asm/alternative.h>
#include <asm/asm.h>
#include <asm/export.h>

@@ -29,7 +31,7 @@
*/
SYM_FUNC_START(rep_movs_alternative)
cmpq $64,%rcx
- jae .Lunrolled
+ alternative "jae .Lunrolled", "jae .Llarge", X86_FEATURE_ERMS

cmp $8,%ecx
jae .Lword
@@ -65,6 +67,12 @@ SYM_FUNC_START(rep_movs_alternative)
_ASM_EXTABLE_UA( 2b, .Lcopy_user_tail)
_ASM_EXTABLE_UA( 3b, .Lcopy_user_tail)

+.Llarge:
+0: rep movsb
+1: RET
+
+ _ASM_EXTABLE_UA( 0b, 1b)
+
.p2align 4
.Lunrolled:
10: movq (%rsi),%r8