Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and removeduplicate code

From: Ingo Molnar
Date: Sat Jun 25 2011 - 06:12:21 EST



* Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:

> 2. With NR_CPUS < 256 the ticket size is 8 bits. The compiler doesn't
> use the same trick as the hand-coded asm to directly compare the high
> and low bytes in the word, but does a bit of extra shuffling around.
> However, the Intel optimisation guide and several x86 experts have
> opined that its best to avoid the high-byte operations anyway, since
> they will cause a partial word stall, and the gcc-generated code should
> be better.
>
> Overall the compiler-generated code is very similar to the hand-coded
> versions, with the partial byte operations being the only significant
> difference. (Curiously, gcc does generate a high-byte compare for me
> in trylock, so it can if it wants to.)
>
> I've been running with this code in place for several months on 4 core
> systems without any problems.

Please do measurements both in terms of disassembly based instruction
count(s) in the fastpath(s) (via looking at the before/after
disassembly) and actual cycle, instruction and branch counts (via
perf measurements).

> I couldn't measure a consistent performance difference between the two
> implemenations; there seemed to be +/- ~1% +/-, which is the level of
> variation I see from simply recompiling the kernel with slightly
> different code alignment.

Then you've done the micro-cost measurements the wrong way - we can
and do detect much finer effects than 1%, see the methods used in
this commit for example:

c8b281161dfa: sched: Increase SCHED_LOAD_SCALE resolution

Please also ensure that the cold-cache behavior is fairly measured
via hot-cache benchmarks (that is not always guaranteed).

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/