Re: [RFCv2] string: Use faster alternatives when constant arguments are used

From: Sultan Alsawaf
Date: Sat Mar 30 2019 - 19:00:17 EST

On Mon, Mar 25, 2019 at 10:24:00PM +0100, Rasmus Villemoes wrote:
> What I'm worried about is your patch changing every single strcmp(,
> "literal") into a memcmp, with absolutely no way of knowing or checking
> anything about the other buffer. And actually, it doesn't have to be a
> BE arch with a word-at-a-time memcmp.
> If (as is usually the case) the strcmp() result is compared to zero, after you
> change
> !strcmp(buf, "literal")
> into
> !memcmp(buf, "literal", 8)
> the compiler may (exactly as you want it to) change that into a single
> 8-byte load (or two 4-byte loads) and comparisons to literals, no
> memcmp() involved. And how do you know that _that_ is ok, for every one
> of the hundreds, if not thousands, of instances in the tree?

When would this not be ok though? From what I've always known,

strcmp(terminated_buf1, terminated_buf2)

is equivalent to

memcmp(terminated_buf1, terminated_buf2, strlen(terminated_buf1))


memcmp(terminated_buf1, terminated_buf2, strlen(terminated_buf2))

regardless of whether or not one side is a literal (my patch just leverages the
compiler's ability to recognize strlen called on literals and optimize it out).
The latter memcmp instances would indeed perform worse than the first strcmp
when neither arguments are literals, but I don't see what makes the memcmp usage
"dangerous". How can the memcmps cross a page boundary when memcmp itself will
only read in large buffers of data at word boundaries?

And if there are concerns for some arches but not others, then couldn't this be
a feasible optimization for those which would work well with it?