Re: [PATCH] speed up on find_first_bit for i386 (let compiler dothe work)
From: Steven Rostedt
Date: Thu Jul 28 2005 - 13:32:52 EST
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote:
> On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> > I've been playing with different approaches, (still all hot cache
> > though), and inspecting the generated code. It's not that the gcc
> > generated code is always better for the normal case. But since it sees
> > more and everything is not hidden in asm, it can optimise what is being
> > used, and how it's used.
>
> Since you're considering GCC-generated code for ffs(), ffz() and friends,
> how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as
> apropriate? Reasonably recent GCC may actually be good enough to use the
> fastest code depending on the processor submodel selected.
>
OK, I guess when I get some time, I'll start testing all the i386 bitop
functions, comparing the asm with the gcc versions. Now could someone
explain to me what's wrong with testing hot cache code. Can one
instruction retrieve from memory better than others? I was trying to
see which whas slower in CPU, but if an algorithm aligns with the cache
or something that is faster, my hot cache testing will not catch that.
But I don't know how to write a test that would test over and over again
something that is not in cache. It would seem that I would have to find
a way to flush the L1 and L2 cache each time. But it still seems to be
adding too many variables to the equation to get good meaningful
benchmarks.
-- Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/