Re: [PATCH 3/6] sched/numa: Avoid task migration for small numa improvement
From: Ingo Molnar
Date: Mon Sep 10 2018 - 04:46:41 EST
* Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> wrote:
> If numa improvement from the task migration is going to be very
> minimal, then avoid task migration.
>
> specjbb2005 / bops/JVM / higher bops are better
> on 2 Socket/2 Node Intel
> JVMS Prev Current %Change
> 4 200892 210118 4.59252
> 1 325766 313171 -3.86627
>
>
> on 2 Socket/4 Node Power8 (PowerNV)
> JVMS Prev Current %Change
> 8 89011.9 91027.5 2.26442
> 1 211338 216460 2.42361
>
>
> on 2 Socket/2 Node Power9 (PowerNV)
> JVMS Prev Current %Change
> 4 190261 191918 0.870909
> 1 195305 207043 6.01009
>
>
> on 4 Socket/4 Node Power7
> JVMS Prev Current %Change
> 8 57651.1 58462.1 1.40674
> 1 111351 108334 -2.70945
>
>
> dbench / transactions / higher numbers are better
> on 2 Socket/2 Node Intel
> count Min Max Avg Variance %Change
> 5 12254.7 12331.9 12297.8 28.1846
> 5 11851.8 11937.3 11890.9 33.5169 -3.30872
>
>
> on 2 Socket/4 Node Power8 (PowerNV)
> count Min Max Avg Variance %Change
> 5 4997.83 5030.14 5015.54 12.947
> 5 4791 5016.08 4962.55 85.9625 -1.05652
>
>
> on 2 Socket/2 Node Power9 (PowerNV)
> count Min Max Avg Variance %Change
> 5 9331.84 9375.11 9352.04 16.0703
> 5 9353.43 9380.49 9369.6 9.04361 0.187767
>
>
> on 4 Socket/4 Node Power7
> count Min Max Avg Variance %Change
> 5 147.55 181.605 168.963 11.3513
> 5 149.518 215.412 179.083 21.5903 5.98948
>
> Signed-off-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
> ---
> Changelog v1->v2:
> - Handle trivial changes due to variable name change. (Rik Van Riel)
> - Drop changes where subsequent better cpu find was rejected for
> small numa improvement (Rik Van Riel).
>
> kernel/sched/fair.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5cf921a..a717870 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1568,6 +1568,13 @@ static bool load_too_imbalanced(long src_load, long dst_load,
> }
>
> /*
> + * Maximum numa importance can be 1998 (2*999);
> + * SMALLIMP @ 30 would be close to 1998/64.
> + * Used to deter task migration.
> + */
> +#define SMALLIMP 30
> +
> +/*
> * This checks if the overall compute and NUMA accesses of the system would
> * be improved if the source tasks was migrated to the target dst_cpu taking
> * into account that it might be best if task running on the dst_cpu should
> @@ -1600,7 +1607,7 @@ static void task_numa_compare(struct task_numa_env *env,
> goto unlock;
>
> if (!cur) {
> - if (maymove || imp > env->best_imp)
> + if (maymove && moveimp >= env->best_imp)
> goto assign;
> else
> goto unlock;
> @@ -1643,16 +1650,22 @@ static void task_numa_compare(struct task_numa_env *env,
> task_weight(cur, env->dst_nid, dist);
> }
>
> - if (imp <= env->best_imp)
> - goto unlock;
> -
> if (maymove && moveimp > imp && moveimp > env->best_imp) {
> - imp = moveimp - 1;
> + imp = moveimp;
> cur = NULL;
> goto assign;
> }
>
> /*
> + * If the numa importance is less than SMALLIMP,
> + * task migration might only result in ping pong
> + * of tasks and also hurt performance due to cache
> + * misses.
> + */
> + if (imp < SMALLIMP || imp <= env->best_imp + SMALLIMP / 2)
> + goto unlock;
> +
> + /*
> * In the overloaded case, try and keep the load balanced.
> */
> load = task_h_load(env->p) - task_h_load(cur);
So what is this 'NUMA importance'? Seems just like a random parameter which generally isn't a
good idea.
Also, same review feedback as I gave for the previous patches.
Thanks,
Ingo