Re: RE: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match

From: Christophe JAILLET
Date: Tue Oct 25 2022 - 15:33:09 EST


Le 25/10/2022 à 19:53, Nathan Moinvaziri a écrit :
Hi Andy,

I appreciate your quick feedback!

I have done as you suggested and published my results this time using Google benchmark:
https://github.com/nmoinvaz/strcasecmp

Hi,
the algorithm on github is not the same as the one posted here.

IIUC, the one on github is wrong. If you compare 2 strings that are the same, they will have the same length, and "if (c1 == c2) continue;" will go one past the end of the strings. And the result will be <0 or 0 or >0 depending the the char *after* the trailing \0.

On the other side, the results of the benchmark on github are likely not accurate with the algorithm posted here, because there is one more test in each loop ("while (c1 != 0)") as long as the 2 strings are the same.
On github this test is skipped because you will go through the "continue"

CJ


After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand.

Thanks again,
Nathan

-----Original Message-----
From: Andy Shevchenko <andy@xxxxxxxxxx>
Sent: Tuesday, October 25, 2022 2:04 AM
To: Nathan Moinvaziri <nathan@xxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match

On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote:
On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@xxxxxxxxxxx> wrote:

...

When running tests using Quick Benchmark with two matching 256
character strings these changes result in anywhere between ~6-9x speed improvement.

* We use unsigned char instead of int similar to strncasecmp.
* We only subtract c1 - c2 when they are not equal.

...

You tell us that this is more preformant, but have not provided the
numbers. Can we see those, please?

So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open
source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc.

Note, that you basically trash CPU cache lines when characters are not
equal, and before doing that you have a branching. I'm unsure that
your way is more performant than the original one.

--
With Best Regards,
Andy Shevchenko