Re: [RFC] scheduler: improve SMP fairness in CFS

From: Tong Li
Date: Thu Jul 26 2007 - 15:00:47 EST


On Wed, 25 Jul 2007, Ingo Molnar wrote:


* Tong Li <tong.n.li@xxxxxxxxx> wrote:

> Thanks for the patch. It doesn't work well on my 8-way box. Here's the
> output at two different times. It's also changing all the time.

you need to measure it over longer periods of time. Its not worth
balancing for such a thing in any high-frequency manner. (we'd trash the
cache constantly migrating tasks back and forth.)

I have some data below, but before that, I'd like to say, at the same load balancing rate, my proposed approach would allow us to have fairness on the order of seconds. I'm less concerned about trashing the cache. The important thing is to have a knob that allow users to trade off fairness and performance based on their needs. This is the same spirit in CFS, which by default can have much more frequent context switches than the old O(1) scheduler. A valid concern is that this is going to hurt cache and this is something that has not yet been benchmarked thoroughly. On the other hand, I fully support CFS on this because I like the idea of having a powerful infrastructure in place and exporting a set of knobs that give users full flexibility. This is also what I'm trying to achieve with my patch for SMP fairness.


the runtime ranges from 1:38.66 to 1:46.38 - that's a +-3% spread which
is pretty acceptable for ~90 seconds of runtime. Note that the
percentages you see above were likely done over a shorter period that's
why you see them range from 61% to 91%.

> I'm running 2.6.23-rc1 with your patch on a two-socket quad-core
> system. Top refresh rate is the default 3s.

the 3s is the problem: change that to 60s! We no way want to
over-migrate for SMP fairness, the change i did gives us reasonable
long-term SMP fairness without the need for high-rate rebalancing.


I ran 10 nice 0 tasks on 8 CPUs. Each task does a trivial while (1) loop. I ran the whole thing 300 seconds and, at every 60 seconds, I measured for each task the following:

1. Actual CPU time the task received during the past 60 seconds.
2. Ideal CPU time it would receive under a perfect fair scheduler.
3. Lag = ideal time - actual time
Positive lag means the task received less CPU time than its fair share and negative means it received more.
4. Error = lag / ideal time

Results of CFS (2.6.23-rc1 with Ingo's SCHED_LOAD_SCALE_FUZZ change):

sampling interval 60 sampling length 300 num tasks 10
---------------------------------------------------------
Task 0 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:17:09.314 51.14 48.00 -3.14 -6.54%
04:18:09.316 46.05 48.00 1.95 4.06%
04:19:09.317 44.13 48.00 3.87 8.06%
04:20:09.318 44.78 48.00 3.22 6.71%
04:21:09.320 45.35 48.00 2.65 5.52%
---------------------------------------------------------
Task 1 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:17:09.314 47.06 48.00 0.94 1.96%
04:18:09.316 45.80 48.00 2.20 4.58%
04:19:09.317 49.95 48.00 -1.95 -4.06%
04:20:09.318 47.59 48.00 0.41 0.85%
04:21:09.320 46.50 48.00 1.50 3.12%
---------------------------------------------------------
Task 2 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:17:09.314 48.14 48.00 -0.14 -0.29%
04:18:09.316 48.66 48.00 -0.66 -1.37%
04:19:09.317 47.52 48.00 0.48 1.00%
04:20:09.318 45.90 48.00 2.10 4.37%
04:21:09.320 45.89 48.00 2.11 4.40%
---------------------------------------------------------
Task 3 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:17:09.314 48.71 48.00 -0.71 -1.48%
04:18:09.316 44.70 48.00 3.30 6.87%
04:19:09.317 45.86 48.00 2.14 4.46%
04:20:09.318 45.85 48.00 2.15 4.48%
04:21:09.320 44.91 48.00 3.09 6.44%
---------------------------------------------------------
Task 4 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:17:09.314 49.24 48.00 -1.24 -2.58%
04:18:09.316 45.73 48.00 2.27 4.73%
04:19:09.317 49.13 48.00 -1.13 -2.35%
04:20:09.318 45.37 48.00 2.63 5.48%
04:21:09.320 45.75 48.00 2.25 4.69%
---------------------------------------------------------
Task 5 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:17:09.314 49.28 48.00 -1.28 -2.67%
04:18:09.316 50.46 48.00 -2.46 -5.12%
04:19:09.317 43.92 48.00 4.08 8.50%
04:20:09.318 55.18 48.00 -7.18 -14.96%
04:21:09.320 56.97 48.00 -8.97 -18.69%
---------------------------------------------------------
Task 6 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:17:09.314 45.22 48.00 2.78 5.79%
04:18:09.316 44.80 48.00 3.20 6.67%
04:19:09.317 45.32 48.00 2.68 5.58%
04:20:09.318 46.66 48.00 1.34 2.79%
04:21:09.320 48.25 48.00 -0.25 -0.52%
---------------------------------------------------------
Task 7 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:17:09.314 47.52 48.00 0.48 1.00%
04:18:09.316 49.99 48.00 -1.99 -4.15%
04:19:09.317 53.40 48.00 -5.40 -11.25%
04:20:09.318 48.85 48.00 -0.85 -1.77%
04:21:09.320 44.67 48.00 3.33 6.94%
---------------------------------------------------------
Task 8 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:17:09.314 45.69 48.00 2.31 4.81%
04:18:09.316 47.92 48.00 0.08 0.17%
04:19:09.317 48.49 48.00 -0.49 -1.02%
04:20:09.318 54.47 48.00 -6.47 -13.48%
04:21:09.320 57.80 48.00 -9.80 -20.42%
---------------------------------------------------------
Task 9 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:17:09.314 46.86 48.00 1.14 2.37%
04:18:09.316 54.75 48.00 -6.75 -14.06%
04:19:09.317 51.13 48.00 -3.13 -6.52%
04:20:09.318 44.22 48.00 3.78 7.87%
04:21:09.320 42.76 48.00 5.24 10.92%
---------------------------------------------------------
System-wide stats:
Sampling interval: 60 s
Sampling length 300 s
Num threads: 10
Num CPUs: 8
Max error: 10.92%
Min error: -20.42%

----------------------------------------------------------------
Results of my patch:

sampling interval 60 sampling length 300 num tasks 10
---------------------------------------------------------
Task 0 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:42:35.248 47.92 48.00 0.08 0.17%
04:43:35.249 47.89 48.00 0.11 0.23%
04:44:35.250 47.93 48.00 0.07 0.15%
04:45:35.252 47.89 48.00 0.11 0.23%
04:46:35.254 47.90 48.00 0.10 0.21%
---------------------------------------------------------
Task 1 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:42:35.248 47.88 48.00 0.12 0.25%
04:43:35.249 47.90 48.00 0.10 0.21%
04:44:35.250 47.86 48.00 0.14 0.29%
04:45:35.252 47.89 48.00 0.11 0.23%
04:46:35.254 47.87 48.00 0.13 0.27%
---------------------------------------------------------
Task 2 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:42:35.248 47.89 48.00 0.11 0.23%
04:43:35.249 47.88 48.00 0.12 0.25%
04:44:35.250 47.88 48.00 0.12 0.25%
04:45:35.252 47.91 48.00 0.09 0.19%
04:46:35.254 47.85 48.00 0.15 0.31%
---------------------------------------------------------
Task 3 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:42:35.248 47.87 48.00 0.13 0.27%
04:43:35.249 47.81 48.00 0.19 0.40%
04:44:35.250 47.85 48.00 0.15 0.31%
04:45:35.252 47.92 48.00 0.08 0.17%
04:46:35.254 47.83 48.00 0.17 0.35%
---------------------------------------------------------
Task 4 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:42:35.248 47.84 48.00 0.16 0.33%
04:43:35.249 47.83 48.00 0.17 0.35%
04:44:35.250 47.94 48.00 0.06 0.13%
04:45:35.252 47.87 48.00 0.13 0.27%
04:46:35.254 47.93 48.00 0.07 0.15%
---------------------------------------------------------
Task 5 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:42:35.248 47.84 48.00 0.16 0.33%
04:43:35.249 47.93 48.00 0.07 0.15%
04:44:35.250 47.88 48.00 0.12 0.25%
04:45:35.252 47.85 48.00 0.15 0.31%
04:46:35.254 47.92 48.00 0.08 0.17%
---------------------------------------------------------
Task 6 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:42:35.248 47.89 48.00 0.11 0.23%
04:43:35.249 47.90 48.00 0.10 0.21%
04:44:35.250 47.86 48.00 0.14 0.29%
04:45:35.252 47.89 48.00 0.11 0.23%
04:46:35.254 47.86 48.00 0.14 0.29%
---------------------------------------------------------
Task 7 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:42:35.248 47.93 48.00 0.07 0.15%
04:43:35.249 47.86 48.00 0.14 0.29%
04:44:35.250 47.89 48.00 0.11 0.23%
04:45:35.252 47.86 48.00 0.14 0.29%
04:46:35.254 47.91 48.00 0.09 0.19%
---------------------------------------------------------
Task 8 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:42:35.248 47.88 48.00 0.12 0.25%
04:43:35.249 47.87 48.00 0.13 0.27%
04:44:35.250 47.87 48.00 0.13 0.27%
04:45:35.252 47.87 48.00 0.13 0.27%
04:46:35.254 47.89 48.00 0.11 0.23%
---------------------------------------------------------
Task 9 (unit: second) nice 0 weight 1024
----------------------------------------
Wall Clock CPU Time Ideal Time Lag Error 04:42:35.248 47.87 48.00 0.13 0.27%
04:43:35.249 47.92 48.00 0.08 0.17%
04:44:35.250 47.90 48.00 0.10 0.21%
04:45:35.252 47.88 48.00 0.12 0.25%
04:46:35.254 47.88 48.00 0.12 0.25%
---------------------------------------------------------
System-wide stats:
Sampling interval: 60 s
Sampling length 300 s
Num threads: 10
Num CPUs: 8
Max error: 0.40%
Min error: 0.13%

tong
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/