Re: Big git diff speedup by avoiding x86 "fast string" memcmp

From: Nick Piggin
Date: Mon Dec 13 2010 - 03:25:19 EST


On Mon, Dec 13, 2010 at 6:29 PM, J. R. Okajima <hooanon05@xxxxxxxxxxx> wrote:
>
> Nick Piggin:
>> It's not scaling but just single threaded performance. gcc turns memcmp
>> into rep cmp, which has quite a long latency, so it's not appripriate
>> for short strings.
>
> Honestly speaking I doubt how this 'long *' approach is effective
> (Of course it never means that your result (by 'char *') is doubtful).

Well, let's see what turns up. We certainly can try the long *
approach. I suspect on architectures where byte loads are
very slow, gcc will block the loop into larger loads, so it should
be no worse than a normal memcmp call, but if we do explicit
padding we can avoid all the problems associated with tail
handling.

Doing name padding and long * comparison will be practically
free (because slab allocator will align out to sizeof(long long)
anyway), so if any architecture prefers to do the long loads, I'd
be interested to hear and we could whip up a patch.

> But is the "rep cmp has quite a long latency" issue generic for all x86
> architecture, or Westmere system specific?

I don't believe it is Westmere specific. Intel and AMD have
been improving these instructions in the past few years, so
Westmere is probably as good or better than any.

That said, rep cmp may not be as heavily optimized as the
set and copy string instructions.

In short, I think the change should be suitable for all x86 CPUs,
but I would like to hear more opinions or see numbers for other
cores.

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/