[PATCH 3/6] sched/numa: Avoid task migration for small numa improvement

From: Srikar Dronamraju
Date: Fri Aug 03 2018 - 02:14:59 EST


If numa improvement from the task migration is going to be very
minimal, then avoid task migration.

specjbb2005 / bops/JVM / higher bops are better
on 2 Socket/2 Node Intel
JVMS Prev Current %Change
4 200892 210118 4.59252
1 325766 313171 -3.86627


on 2 Socket/4 Node Power8 (PowerNV)
JVMS Prev Current %Change
8 89011.9 91027.5 2.26442
1 211338 216460 2.42361


on 2 Socket/2 Node Power9 (PowerNV)
JVMS Prev Current %Change
4 190261 191918 0.870909
1 195305 207043 6.01009


on 4 Socket/4 Node Power7
JVMS Prev Current %Change
8 57651.1 58462.1 1.40674
1 111351 108334 -2.70945


dbench / transactions / higher numbers are better
on 2 Socket/2 Node Intel
count Min Max Avg Variance %Change
5 12254.7 12331.9 12297.8 28.1846
5 11851.8 11937.3 11890.9 33.5169 -3.30872


on 2 Socket/4 Node Power8 (PowerNV)
count Min Max Avg Variance %Change
5 4997.83 5030.14 5015.54 12.947
5 4791 5016.08 4962.55 85.9625 -1.05652


on 2 Socket/2 Node Power9 (PowerNV)
count Min Max Avg Variance %Change
5 9331.84 9375.11 9352.04 16.0703
5 9353.43 9380.49 9369.6 9.04361 0.187767


on 4 Socket/4 Node Power7
count Min Max Avg Variance %Change
5 147.55 181.605 168.963 11.3513
5 149.518 215.412 179.083 21.5903 5.98948

Signed-off-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
---
Changelog v1->v2:
- Handle trivial changes due to variable name change. (Rik Van Riel)
- Drop changes where subsequent better cpu find was rejected for
small numa improvement (Rik Van Riel).

kernel/sched/fair.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5cf921a..a717870 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1568,6 +1568,13 @@ static bool load_too_imbalanced(long src_load, long dst_load,
}

/*
+ * Maximum numa importance can be 1998 (2*999);
+ * SMALLIMP @ 30 would be close to 1998/64.
+ * Used to deter task migration.
+ */
+#define SMALLIMP 30
+
+/*
* This checks if the overall compute and NUMA accesses of the system would
* be improved if the source tasks was migrated to the target dst_cpu taking
* into account that it might be best if task running on the dst_cpu should
@@ -1600,7 +1607,7 @@ static void task_numa_compare(struct task_numa_env *env,
goto unlock;

if (!cur) {
- if (maymove || imp > env->best_imp)
+ if (maymove && moveimp >= env->best_imp)
goto assign;
else
goto unlock;
@@ -1643,16 +1650,22 @@ static void task_numa_compare(struct task_numa_env *env,
task_weight(cur, env->dst_nid, dist);
}

- if (imp <= env->best_imp)
- goto unlock;
-
if (maymove && moveimp > imp && moveimp > env->best_imp) {
- imp = moveimp - 1;
+ imp = moveimp;
cur = NULL;
goto assign;
}

/*
+ * If the numa importance is less than SMALLIMP,
+ * task migration might only result in ping pong
+ * of tasks and also hurt performance due to cache
+ * misses.
+ */
+ if (imp < SMALLIMP || imp <= env->best_imp + SMALLIMP / 2)
+ goto unlock;
+
+ /*
* In the overloaded case, try and keep the load balanced.
*/
load = task_h_load(env->p) - task_h_load(cur);
--
1.8.3.1