Re: Big git diff speedup by avoiding x86 "fast string" memcmp
From: J. R. Okajima
Date: Fri Dec 10 2010 - 10:01:38 EST
Nick Piggin:
> The standard memcmp function on a Westmere system shows up hot in
> profiles in the `git diff` workload (both parallel and single threaded),
> and it is likely due to the costs associated with trapping into
> microcode, and little opportunity to improve memory access (dentry
> name is not likely to take up more than a cacheline).
Let me make sure.
What you are pointing out is
- asm("repe; cmpsb") may grab CPU long time, and can be a hazard for
scaling.
- by breaking it into pieces, the chances to scale will increase.
Right?
Anyway this appraoch replacing smallest code by larger but faster code
is interesting.
How about mixing 'unsigned char *' and 'unsigned long *' in referencing
the given strings?
For example,
int f(const unsigned char *cs, const unsigned char *ct, size_t count)
{
int ret;
union {
const unsigned long *l;
const unsigned char *c;
} s, t;
/* this macro is your dentry_memcmp() actually */
#define cmp(s, t, c, step) \
do { \
while ((c) >= (step)) { \
ret = (*(s) != *(t)); \
if (ret) \
return ret; \
(s)++; \
(t)++; \
(c) -= (step); \
} \
} while (0)
s.c = cs;
t.c = ct;
cmp(s.l, t.l, count, sizeof(*s.l));
cmp(s.c, t.c, count, sizeof(*s.c));
return 0;
}
What I am thinking here is,
- in load and compare, there is no difference between 'char*' and
'long*', probably.
- obviously 'step by sizeof(long)' will reduce the number of repeats.
- but I am not sure whether the length of string is generally longer
than 4 (or 8) or not.
J. R. Okajima
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/