Re: [PATCH v2] sched/core: Prioritize migrating eligible tasks in sched_balance_rq()

From: Vincent Guittot
Date: Mon Jan 13 2025 - 11:40:58 EST


On Mon, 13 Jan 2025 at 10:21, Hao Jia <jiahao.kernel@xxxxxxxxx> wrote:
>
> Friendly ping...
>
>
> On 2024/12/23 17:14, Hao Jia wrote:
> > From: Hao Jia <jiahao1@xxxxxxxxxxx>
> >
> > When the PLACE_LAG scheduling feature is enabled and
> > dst_cfs_rq->nr_queued is greater than 1, if a task is
> > ineligible (lag < 0) on the source cpu runqueue, it will
> > also be ineligible when it is migrated to the destination
> > cpu runqueue. Because we will keep the original equivalent
> > lag of the task in place_entity(). So if the task was
> > ineligible before, it will still be ineligible after
> > migration.
> >
> > So in sched_balance_rq(), we prioritize migrating eligible
> > tasks, and we soft-limit ineligible tasks, allowing them
> > to migrate only when nr_balance_failed is non-zero to
> > avoid load-balancing trying very hard to balance the load.

Could you explain why you think it's better to balance eligible tasks
in priority and potentially skip a load balance ?

I can see an interest for idle and newly_idle load balance in order to
favor fairness as tasks will become eligible but I don't see why it
would be helpful if dst already has some runnable tasks. Furthermore,
when a cpu is idle or newly idle, we really want to migrate a task
even an non eligible one instead of possibly skipping this load
balance round. With your patch, we might end up not pulling any task,
increasing the nr_balance_failed and waiting next load balance

> >
> > Below are some benchmark test results. From my test results,
> > this patch shows a slight improvement on hackbench.
> >
> > Benchmark
> > =========
> >
> > All of the benchmarks are done inside a normal cpu cgroup in a
> > clean environment with cpu turbo disabled, and test machine is:
> >
> > Single NUMA machine model is 13th Gen Intel(R) Core(TM)
> > i7-13700, 12 Core/24 HT.
> >
> > Based on master b86545e02e8c.
> >
> > Results
> > =======
> >
> > hackbench-process-pipes
> > vanilla patched
> > Amean 1 0.5837 ( 0.00%) 0.5733 ( 1.77%)
> > Amean 4 1.4423 ( 0.00%) 1.4503 ( -0.55%)
> > Amean 7 2.5147 ( 0.00%) 2.4773 ( 1.48%)
> > Amean 12 3.9347 ( 0.00%) 3.8880 ( 1.19%)
> > Amean 21 5.3943 ( 0.00%) 5.3873 ( 0.13%)
> > Amean 30 6.7840 ( 0.00%) 6.6660 ( 1.74%)
> > Amean 48 9.8313 ( 0.00%) 9.6100 ( 2.25%)
> > Amean 79 15.4403 ( 0.00%) 14.9580 ( 3.12%)
> > Amean 96 18.4970 ( 0.00%) 17.9533 ( 2.94%)
> >
> > hackbench-process-sockets
> > vanilla patched
> > Amean 1 0.6297 ( 0.00%) 0.6223 ( 1.16%)
> > Amean 4 2.1517 ( 0.00%) 2.0887 ( 2.93%)
> > Amean 7 3.6377 ( 0.00%) 3.5670 ( 1.94%)
> > Amean 12 6.1277 ( 0.00%) 5.9290 ( 3.24%)
> > Amean 21 10.0380 ( 0.00%) 9.7623 ( 2.75%)
> > Amean 30 14.1517 ( 0.00%) 13.7513 ( 2.83%)
> > Amean 48 24.7253 ( 0.00%) 24.2287 ( 2.01%)
> > Amean 79 43.9523 ( 0.00%) 43.2330 ( 1.64%)
> > Amean 96 54.5310 ( 0.00%) 53.7650 ( 1.40%)
> >
> > tbench4 Throughput
> > vanilla patched
> > Hmean 1 255.97 ( 0.00%) 275.01 ( 7.44%)
> > Hmean 2 511.60 ( 0.00%) 544.27 ( 6.39%)
> > Hmean 4 996.70 ( 0.00%) 1006.57 ( 0.99%)
> > Hmean 8 1646.46 ( 0.00%) 1649.15 ( 0.16%)
> > Hmean 16 2259.42 ( 0.00%) 2274.35 ( 0.66%)
> > Hmean 32 4725.48 ( 0.00%) 4735.57 ( 0.21%)
> > Hmean 64 4411.47 ( 0.00%) 4400.05 ( -0.26%)
> > Hmean 96 4284.31 ( 0.00%) 4267.39 ( -0.39%)
> >
> > Signed-off-by: Hao Jia <jiahao1@xxxxxxxxxxx>
> > Suggested-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > ---
> > Previous discussion link: https://lore.kernel.org/all/20241128084858.25220-1-jiahao.kernel@xxxxxxxxx
> > Link to v1: https://lore.kernel.org/all/20241218080203.80556-1-jiahao.kernel@xxxxxxxxx
> >
> > v1 to v2:
> > - Modify dst_cfs_rq->nr_running to dst_cfs_rq->nr_queued to
> > resolve conflicts with commit 736c55a02c47 ("sched/fair:
> > Rename cfs_rq.nr_running into nr_queued").
> >
> > kernel/sched/fair.c | 34 ++++++++++++++++++++++++++++++++++
> > 1 file changed, 34 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 5599b0c1ba9b..c884bf631e66 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9396,6 +9396,30 @@ static inline int migrate_degrades_locality(struct task_struct *p,
> > }
> > #endif
> >
> > +/*
> > + * Check whether the task is ineligible on the destination cpu
> > + *
> > + * When the PLACE_LAG scheduling feature is enabled and
> > + * dst_cfs_rq->nr_queued is greater than 1, if the task
> > + * is ineligible, it will also be ineligible when
> > + * it is migrated to the destination cpu.
> > + */
> > +static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_cpu)
> > +{
> > + struct cfs_rq *dst_cfs_rq;
> > +
> > +#ifdef CONFIG_FAIR_GROUP_SCHED
> > + dst_cfs_rq = task_group(p)->cfs_rq[dest_cpu];
> > +#else
> > + dst_cfs_rq = &cpu_rq(dest_cpu)->cfs;
> > +#endif
> > + if (sched_feat(PLACE_LAG) && dst_cfs_rq->nr_queued &&
> > + !entity_eligible(task_cfs_rq(p), &p->se))
> > + return 1;
> > +
> > + return 0;
> > +}
> > +
> > /*
> > * can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
> > */
> > @@ -9420,6 +9444,16 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> > if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
> > return 0;
> >
> > + /*
> > + * We want to prioritize the migration of eligible tasks.
> > + * For ineligible tasks we soft-limit them and only allow
> > + * them to migrate when nr_balance_failed is non-zero to
> > + * avoid load-balancing trying very hard to balance the load.
> > + */
> > + if (!env->sd->nr_balance_failed &&
> > + task_is_ineligible_on_dst_cpu(p, env->dst_cpu))
> > + return 0;
> > +
> > /* Disregard percpu kthreads; they are where they need to be. */
> > if (kthread_is_per_cpu(p))
> > return 0;