Well, this is a fine theory - and it was what I thought when I started
measuring things - but it's wrong in practice. If you take a suite of
tests, lmbench for example, and do a bunch of runs and scatter plot them
and stare at them you'll see patterns emerging. Now if the pattern was
that most run times clustered around the min, then my feeling is that
the min is the right number. Wherever they cluster up is the number I
wanted because that was the number mostly likely to be seen.
Note that if the min is where things cluster, then the min will be very
darn close to the median.
The min is not necessarily the right answer. Suppose most of the time you
get get N but one time you happen to get very lucky in your cache layout
(or whatever) and you get N/2. Is N the right answer or is N/2? In my
mind, the right answer is the one that is most frequently reproduced.
So, yes, you can get better numbers than what I report. But not with
any high probability. Choosing the median was on purpose, I wanted
numbers that represented what I thought people would actually get when
doing similar things - I'm really sick of benchmarks that tweak things
like crazy (i.e., -lspec_malloc) to get better numbers, numbers which
are hard or impossible to reproduce in real life.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/