Re: [REPORT] cfs-v6-rc2 vs sd-0.46 vs 2.6.21-rc7

From: Ingo Molnar
Date: Thu Apr 26 2007 - 08:08:34 EST



* Michael Gerdau <mgd@xxxxxxxxxxxx> wrote:

> Hi list,
>
> find below a test comparing
> 2.6.21-rc7 (mainline)
> 2.6.21-rc7-sd046
> 2.6.21-rc7-cfs-v6-rc2(*) (X @ nice 0)
> 2.6.21-rc7-cfs-v6-rc2(*) (X @ nice -10)
> running on a dualcore x86_64.

thanks for the testing!

as a summary: i think your numbers demonstrate it nicely that the
shorter 'timeslice length' that both CFS and SD utilizes does not have a
measurable negative impact on your workload. To measure the total impact
of 'timeslicing' you might want to try the exact same workload with a
much higher 'timeslice length' of say 400 msecs, via:

echo 400000000 > /proc/sys/kernel/sched_granularity_ns # on CFS
echo 400 > /proc/sys/kernel/rr_interval # on SD

your existing numbers are a bit hard to analyze because the 3 workloads
were started at the same time and they overlapped differently and
utilized the system differently.

i think the primary number that makes sense to look at (which is perhaps
the least sensitive to the 'overlap effect') is the 'combined user times
of all 3 workloads' (in order of performance):

> 2.6.21-rc7: 20589.423 100.00%
> 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 20613.845 99.88%
> 2.6.21-rc7-sd046: 20617.945 99.86%
> 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 20743.564 99.25%

to me this gives the impression that it's all "within noise". In
particular the two CFS results suggest that there's at least a ~100
seconds noise in these results, because the renicing of X should have no
impact on the result (the workloads are pure number-crunchers, and all
use up the CPUs 100%, correct?), and even if it has an impact, renicing
X to nice 0 should _speed up_ the result - not slow it down a bit like
the numbers suggest.

another (perhaps less reliable) number is the total wall-clock runtime
of all 3 jobs. Provided i did not make any mistakes in my calculations,
here are the results:

> 2.6.21-rc7-sd046: 10512.611 seconds
> 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 10605.946 seconds
> 2.6.21-rc7: 10650.535 seconds
> 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 10788.125 seconds

(the numbers are lower than the first numbers because this is a 2 CPU
system)

both SD and CFS-nice-10 was faster than mainline, but i'd say this too
is noise - especially because this result highly depends on the way the
workloads overlap in general, which seems to be different for SD.

system time is interesting too:

> 2.6.21-rc7: 35.379
> 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 40.399
> 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 44.239
> 2.6.21-rc7-sd046: 45.515

here too the two CFS results seem to suggest that there's at least
around 5 seconds of noise. So i'd not necessarily call it systematic
that vanilla had the lowest system time and SD had the highest.

combined system+user time:

> 2.6.21-rc7: 20624.802
> 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 20658.084
> 2.6.21-rc7-sd046: 20663.460
> 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 20783.963

perhaps it might make more sense to run the workloads serialized, to
have better comparabality of the individual workloads. (on a real system
you'd naturally want to overlap these workloads to utilize the CPUs, so
the numbers you did are very relevant too.)

The vmstat suggested there is occasional idle time in the system - is
the workload IO-bound (or memory bound) in those cases?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/