Re: [RFC] scheduler: improve SMP fairness in CFS

From: Ingo Molnar
Date: Wed Jul 25 2007 - 08:04:20 EST



* Ingo Molnar <mingo@xxxxxxx> wrote:

> > This patch extends CFS to achieve better fairness for SMPs. For
> > example, with 10 tasks (same priority) on 8 CPUs, it enables each task
> > to receive equal CPU time (80%). [...]
>
> hm, CFS should already offer reasonable long-term SMP fairness. It
> certainly works on a dual-core box, i just started 3 tasks of the same
> priority on 2 CPUs, and on vanilla 2.6.23-rc1 the distribution is
> this:
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 7084 mingo 20 0 1576 248 196 R 67 0.0 0:50.13 loop
> 7083 mingo 20 0 1576 244 196 R 66 0.0 0:48.86 loop
> 7085 mingo 20 0 1576 244 196 R 66 0.0 0:49.45 loop
>
> so each task gets a perfect 66% of CPU time.
>
> prior CFS, we indeed did a 50%/50%/100% split - so for example on
> v2.6.22:
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 2256 mingo 25 0 1580 248 196 R 100 0.0 1:03.19 loop
> 2255 mingo 25 0 1580 248 196 R 50 0.0 0:31.79 loop
> 2257 mingo 25 0 1580 248 196 R 50 0.0 0:31.69 loop
>
> but CFS has changed that behavior.
>
> I'll check your 10-tasks-on-8-cpus example on an 8-way box too, maybe
> we regressed somewhere ...

ok, i just tried it on an 8-cpu box and indeed, unlike the dual-core
case, the scheduler does not distribute tasks well enough:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2572 mingo 20 0 1576 244 196 R 100 0.0 1:03.61 loop
2578 mingo 20 0 1576 248 196 R 100 0.0 1:03.59 loop
2576 mingo 20 0 1576 248 196 R 100 0.0 1:03.52 loop
2571 mingo 20 0 1576 244 196 R 100 0.0 1:03.46 loop
2569 mingo 20 0 1576 244 196 R 99 0.0 1:03.36 loop
2570 mingo 20 0 1576 244 196 R 95 0.0 1:00.55 loop
2577 mingo 20 0 1576 248 196 R 50 0.0 0:31.88 loop
2574 mingo 20 0 1576 248 196 R 50 0.0 0:31.87 loop
2573 mingo 20 0 1576 248 196 R 50 0.0 0:31.86 loop
2575 mingo 20 0 1576 248 196 R 50 0.0 0:31.86 loop

but this is relatively easy to fix - with the patch below applied, it
looks a lot better:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2681 mingo 20 0 1576 244 196 R 85 0.0 3:51.68 loop
2688 mingo 20 0 1576 244 196 R 81 0.0 3:46.35 loop
2682 mingo 20 0 1576 244 196 R 80 0.0 3:43.68 loop
2685 mingo 20 0 1576 248 196 R 80 0.0 3:45.97 loop
2683 mingo 20 0 1576 248 196 R 80 0.0 3:40.25 loop
2679 mingo 20 0 1576 244 196 R 80 0.0 3:33.53 loop
2680 mingo 20 0 1576 244 196 R 79 0.0 3:43.53 loop
2686 mingo 20 0 1576 244 196 R 79 0.0 3:39.31 loop
2687 mingo 20 0 1576 244 196 R 78 0.0 3:33.31 loop
2684 mingo 20 0 1576 244 196 R 77 0.0 3:27.52 loop

they now nicely converte to the expected 80% long-term CPU usage.

so, could you please try the patch below, does it work for you too?

Ingo

--------------------------->
Subject: sched: increase SCHED_LOAD_SCALE_FUZZ
From: Ingo Molnar <mingo@xxxxxxx>

increase SCHED_LOAD_SCALE_FUZZ that adds a small amount of
over-balancing: to help distribute CPU-bound tasks more fairly on SMP
systems.

the problem of unfair balancing was noticed and reported by Tong N Li.

10 CPU-bound tasks running on 8 CPUs, v2.6.23-rc1:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2572 mingo 20 0 1576 244 196 R 100 0.0 1:03.61 loop
2578 mingo 20 0 1576 248 196 R 100 0.0 1:03.59 loop
2576 mingo 20 0 1576 248 196 R 100 0.0 1:03.52 loop
2571 mingo 20 0 1576 244 196 R 100 0.0 1:03.46 loop
2569 mingo 20 0 1576 244 196 R 99 0.0 1:03.36 loop
2570 mingo 20 0 1576 244 196 R 95 0.0 1:00.55 loop
2577 mingo 20 0 1576 248 196 R 50 0.0 0:31.88 loop
2574 mingo 20 0 1576 248 196 R 50 0.0 0:31.87 loop
2573 mingo 20 0 1576 248 196 R 50 0.0 0:31.86 loop
2575 mingo 20 0 1576 248 196 R 50 0.0 0:31.86 loop

v2.6.23-rc1 + patch:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2681 mingo 20 0 1576 244 196 R 85 0.0 3:51.68 loop
2688 mingo 20 0 1576 244 196 R 81 0.0 3:46.35 loop
2682 mingo 20 0 1576 244 196 R 80 0.0 3:43.68 loop
2685 mingo 20 0 1576 248 196 R 80 0.0 3:45.97 loop
2683 mingo 20 0 1576 248 196 R 80 0.0 3:40.25 loop
2679 mingo 20 0 1576 244 196 R 80 0.0 3:33.53 loop
2680 mingo 20 0 1576 244 196 R 79 0.0 3:43.53 loop
2686 mingo 20 0 1576 244 196 R 79 0.0 3:39.31 loop
2687 mingo 20 0 1576 244 196 R 78 0.0 3:33.31 loop
2684 mingo 20 0 1576 244 196 R 77 0.0 3:27.52 loop

so they now nicely converte to the expected 80% long-term CPU usage.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
include/linux/sched.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -681,7 +681,7 @@ enum cpu_idle_type {
#define SCHED_LOAD_SHIFT 10
#define SCHED_LOAD_SCALE (1L << SCHED_LOAD_SHIFT)

-#define SCHED_LOAD_SCALE_FUZZ (SCHED_LOAD_SCALE >> 5)
+#define SCHED_LOAD_SCALE_FUZZ (SCHED_LOAD_SCALE >> 1)

#ifdef CONFIG_SMP
#define SD_LOAD_BALANCE 1 /* Do load balancing on this domain. */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/