Re: [RFC, patch] i386: vgetcpu(), take 2

From: Linus Torvalds
Date: Wed Jun 21 2006 - 13:32:54 EST




On Wed, 21 Jun 2006, Andi Kleen wrote:
>
> My measurements show different - i get 60+ cycles on K8 and 150+ cycles
> on P4. That is with a full vsyscall around it. However it is still
> far better than CPUID, however slower than RDTSCP on those CPUs that support it.
>
> I changed the CPUID fallback path to use LSL on x86-64

One note of warning:

Playing "clever games" has a real tendency to suck badly eventually. I'm
betting LSL is pretty damn low on any list of instructions to be optimized
by the CPU core, so it would tend to always be microcoded, while other ops
might get faster.

> Measuring this way is a bad idea because you get far too much
> noise from the RDTSCs. Usually you need to put a a few thousands entry
> loop inside the RDTSCP and devide the result by the loop count

And measuring that way isn't perfect either, because it tends to show you
how well an instruction works in that particular instruction mix, but not
necessarily in real life.

Benchmarking single instructions is simply damn hard. It's often better to
try to find a real load where that particular sequence is important enough
to be measurable at all, and then try the alternatives. Not perfect
either, but if you can't find such a load, maybe you shouldn't be doing it
in the first place.. And if you _can_ find such a real load, at least you
measured something that was actually real.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/