Re: [PATCH] lib/string: Bring optimized memcmp from glibc

From: Nikolay Borisov
Date: Wed Jul 21 2021 - 14:17:34 EST




On 21.07.21 г. 21:00, Linus Torvalds wrote:
> On Wed, Jul 21, 2021 at 6:59 AM Nikolay Borisov <nborisov@xxxxxxxx> wrote:
>>
>> This is glibc's memcmp version. The upside is that for architectures
>> which don't have an optimized version the kernel can provide some
>> solace in the form of a generic, word-sized optimized memcmp. I tested
>> this with a heavy IOCTL_FIDEDUPERANGE(2) workload and here are the
>> results I got:
>
> Hmm. I suspect the usual kernel use of memcmp() is _very_ skewed to
> very small memcmp calls, and I don't think I've ever seen that
> (horribly bad) byte-wise default memcmp in most profiles.
>
> I suspect that FIDEDUPERANGE thing is most likely a very special case.
>
> So I don't think you're wrong to look at this, but I think you've gone
> from our old "spend no effort at all" to "look at one special case".
>
> And I think the glibc implementation is horrible and doesn't know
> about machines where unaligned loads are cheap - which is all
> reasonable ones.
>
> That MERGE() macro is disgusting, and memcmp_not_common_alignment()
> should not exist on any sane architecture. It's literally doing extra
> work to make for slower accesses, when the hardware does it better
> natively.
>
> So honestly, I'd much rather see a much saner and simpler
> implementation that works well on the architectures that matter, and
> that don't want that "align things by hand".
>
> Aligning one of the sources by hand is fine and makes sense - so that
> _if_ the two strings end up being mutually aligned, all subsequent
> accesses are aligned.

I find it somewhat arbitrary that we choose to align the 2nd pointer and
not the first. Obviously it'll be easy to detect which one of the 2 is
unaligned and align it so that from thereon memcmp can continue doing
aligned accesses. However, this means a check like that would be done
for *every* (well, barring some threshold value) access to memcmp.

>
> But then trying to do shift-and-masking for the possibly remaining
> unaligned source is crazy and garbage. Don't do it.
>
> And you never saw that, because your special FIDEDUPERANGE testcase
> will never have anything but mutually aligned cases.
>
> Which just shows that going from "don't care at all' to "care about
> one special case" is not the way to go.
>
> So I'd much rather see a simple default function that works well for
> the sane architectures, than go with the default code from glibc - and
> bad for the common modern architectures.

So you are saying that the current memcmp could indeed use improvement
but you don't want it to be based on the glibc's code due to the ugly
misalignment handling?

>
> Then architectures could choose that one with some

So you are suggesting keeping the current byte comparison one aka
'naive' and having another, more optimized generic implementation that
should be selected by GENERIC_MEMCMP or have I misunderstood you ?

>
> select GENERIC_MEMCMP
>
> the same way we have
>
> select GENERIC_STRNCPY_FROM_USER
>
> for the (sane, for normal architectures) common optimized case for a
> special string instruction that matters a lot for the kernel.
>
> Linus
>