Re: [PATCH] lib/string: Bring optimized memcmp from glibc

From: Linus Torvalds
Date: Wed Jul 21 2021 - 14:45:59 EST

Next message: Heiko Carstens: "[GIT PULL] s390 updates for 5.14-rc3"
Previous message: Matthew Wilcox: "Re: [PATCH] hugetlbfs: fix mount mode command line processing"
In reply to: Nikolay Borisov: "Re: [PATCH] lib/string: Bring optimized memcmp from glibc"
Next in thread: Linus Torvalds: "Re: [PATCH] lib/string: Bring optimized memcmp from glibc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Jul 21, 2021 at 11:17 AM Nikolay Borisov <nborisov@xxxxxxxx> wrote:
>
> I find it somewhat arbitrary that we choose to align the 2nd pointer and
> not the first.

Yeah, that's a bit odd, but I don't think it matters.

The hope is obviously that they are mutually aligned, and in that case
it doesn't matter which one you aim to align.

> So you are saying that the current memcmp could indeed use improvement
> but you don't want it to be based on the glibc's code due to the ugly
> misalignment handling?

Yeah. I suspect that this (very simple) patch gives you the same
performance improvement that the glibc code does.

NOTE! I'm not saying this patch is perfect. This one doesn't even
_try_ to do the mutual alignment, because it's really silly. But I'm
throwing this out here for discussion, because

- it's really simple

- I suspect it gets you 99% of the way there

- the code generation is actually quite good with both gcc and clang.
This is gcc:

memcmp:
jmp .L60
.L52:
movq (%rsi), %rax
cmpq %rax, (%rdi)
jne .L53
addq $8, %rdi
addq $8, %rsi
subq $8, %rdx
.L60:
cmpq $7, %rdx
ja .L52
testq %rdx, %rdx
je .L61
.L53:
xorl %ecx, %ecx
jmp .L56
.L62:
addq $1, %rcx
cmpq %rcx, %rdx
je .L51
.L56:
movzbl (%rdi,%rcx), %eax
movzbl (%rsi,%rcx), %r8d
subl %r8d, %eax
je .L62
.L51:
ret
.L61:
xorl %eax, %eax
ret

and notice how there are no spills, no extra garbage, just simple and
straightforward code.

Those things ends mattering too - it's good for I$, it's good for the
small cases, and it's good for debugging and reading the code.

If this is "good enough" for your test-case, I really would prefer
something like this. "Make it as simple as possible, but no simpler"

I can do the mutual alignment too, but I'd actually prefer to do it as
a separate patch, for when there are numbers for that.

And I wouldn't do it as a byte-by-byte case, because that's just stupid.

I'd do it using a separate first single "get unaligned word from both
sources, compare them for equality, and then only add enough bytes to
align"

Linus
lib/string.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 77bd0b1d3296..b2de45a581f4 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -29,6 +29,7 @@
#include <linux/errno.h>
#include <linux/slab.h>

+#include <asm/unaligned.h>
#include <asm/byteorder.h>
#include <asm/word-at-a-time.h>
#include <asm/page.h>
@@ -935,6 +936,21 @@ __visible int memcmp(const void *cs, const void *ct, size_t count)
const unsigned char *su1, *su2;
int res = 0;

+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+ if (count >= sizeof(unsigned long)) {
+ const unsigned long *u1 = cs;
+ const unsigned long *u2 = ct;
+ do {
+ if (get_unaligned(u1) != get_unaligned(u2))
+ break;
+ u1++;
+ u2++;
+ count -= sizeof(unsigned long);
+ } while (count >= sizeof(unsigned long));
+ cs = u1;
+ ct = u2;
+ }
+#endif
for (su1 = cs, su2 = ct; 0 < count; ++su1, ++su2, count--)
if ((res = *su1 - *su2) != 0)
break;

Next message: Heiko Carstens: "[GIT PULL] s390 updates for 5.14-rc3"
Previous message: Matthew Wilcox: "Re: [PATCH] hugetlbfs: fix mount mode command line processing"
In reply to: Nikolay Borisov: "Re: [PATCH] lib/string: Bring optimized memcmp from glibc"
Next in thread: Linus Torvalds: "Re: [PATCH] lib/string: Bring optimized memcmp from glibc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]