Re: [PATCH v2] sched/core: Prioritize migrating eligible tasks in sched_balance_rq()

From: Hao Jia
Date: Mon Feb 17 2025 - 01:14:31 EST




On 2025/1/20 13:48, Hao Jia wrote:


On 2025/1/16 19:26, Vincent Guittot wrote:
On Wed, 15 Jan 2025 at 12:55, Hao Jia <jiahao.kernel@xxxxxxxxx> wrote:



On 2025/1/15 17:28, Vincent Guittot wrote:
On Wed, 15 Jan 2025 at 09:55, Hao Jia <jiahao.kernel@xxxxxxxxx> wrote:



On 2025/1/14 16:07, Vincent Guittot wrote:
On Tue, 14 Jan 2025 at 04:18, Hao Jia <jiahao.kernel@xxxxxxxxx> wrote:



On 2025/1/14 00:40, Vincent Guittot wrote:
On Mon, 13 Jan 2025 at 10:21, Hao Jia <jiahao.kernel@xxxxxxxxx> wrote:

Friendly ping...


On 2024/12/23 17:14, Hao Jia wrote:
From: Hao Jia <jiahao1@xxxxxxxxxxx>

When the PLACE_LAG scheduling feature is enabled and
dst_cfs_rq->nr_queued is greater than 1, if a task is
ineligible (lag < 0) on the source cpu runqueue, it will
also be ineligible when it is migrated to the destination
cpu runqueue. Because we will keep the original equivalent
lag of the task in place_entity(). So if the task was
ineligible before, it will still be ineligible after
migration.

So in sched_balance_rq(), we prioritize migrating eligible
tasks, and we soft-limit ineligible tasks, allowing them
to migrate only when nr_balance_failed is non-zero to
avoid load-balancing trying very hard to balance the load.

Could you explain why you think it's better to balance eligible tasks
in priority and potentially skip a load balance ?

In place_entity(), we maintain the task's original equivalent lag, even
if we migrate the task to dst_rq, this does not change its eligibility
attribute.

Yes, but you don't answer the question why it's better to select an
eligible task vs a non eligible task.


When there are multiple tasks on src_rq, and the dst_cpu has some
runnable tasks, migrating ineligible tasks to dst_rq will not allow them
to run. Therefore, such task migration is inefficient. We should

Why is it inefficient ? load balance is about evenly balancing the
number of tasks or the load between CPUs, it never says that the newly
migrated task should run immediately


My initial thought is that when we need to migrate some tasks during
load balancing, at the current point in time, migrating ineligible tasks
to dst_cpu means they definitely cannot run there. Therefore, I prefer
to keep them on src_cpu to reduce the overhead of dequeueing and
enqueueing ineligible tasks.

Sorry but I still don't get why it's important and would make a
difference. They are all runnable but ineligible tasks got more
runtime than other at that point in time so there is no real
difference


I adopt a lazy strategy for ineligible tasks. At the current point in
time, even if we migrate ineligible tasks to the dst CPU, they still
have to wait on the dst CPU until they become eligible. We do not see
clear benefits from migrating ineligible tasks, but their dequeueing and
enqueueing would instead incur overhead.

But your explanation doesn't make sense.
Not migrating an ineligible task only make sense for delayed_dequeue
tasks because they don't really want to run but only exhaust their lag
but this is already taken into account by
61b82dfb6b7e ("sched/fair: Do not try to migrate delayed dequeue task")


Thank you for your suggestion.

Yes, as you mentioned, this commit 61b82dfb6b7e ("sched/fair: Do not try to migrate delayed dequeue task") reduces the migration of delayed_dequeue tasks, but it doesn't work for ineligible RUNNING tasks and when the migration_type is migrate_load.


Did you run your benchmark on top of this change ?

My previous benchmark tests were based on the torvalds/linux/master branch, which does not include commit 61b82dfb6b7e ("sched/fair: Do not try to migrate delayed dequeue task"). I will include this commit and retest on my machine after my leave ends.


I'm really sorry for being away for so long. I have retested the hackbench on my machine, which includes the commit 61b82dfb6b7e ("sched/fair: Do not try to migrate delayed dequeue task"). Based on the hackbench test results, this patch still brings a slight performance improvement.

vanilla: Includes commit 61b82dfb6b7e ("sched/fair: Do not try to migrate delayed dequeue task"), but does not include my patch.

patched: Includes both the above commit and my patch.


hackbench-process-pipes
vanilla patched
Amean 1 0.4087 ( 0.00%) 0.4003 ( 2.04%)
Amean 4 1.7033 ( 0.00%) 1.7100 ( -0.39%)
Amean 7 2.9020 ( 0.00%) 2.8750 ( 0.93%)
Amean 12 4.2543 ( 0.00%) 4.2980 ( -1.03%)
Amean 21 5.8633 ( 0.00%) 5.7507 ( 1.92%)
Amean 30 7.3757 ( 0.00%) 7.2887 ( 1.18%)
Amean 48 10.5360 ( 0.00%) 10.2647 ( 2.58%)
Amean 79 16.5480 ( 0.00%) 15.9820 ( 3.42%)
Amean 96 19.7873 ( 0.00%) 19.1347 ( 3.30%)


hackbench-process-sockets
vanilla patched
Amean 1 0.7520 ( 0.00%) 0.7377 ( 1.91%)
Amean 4 2.5760 ( 0.00%) 2.5103 ( 2.55%)
Amean 7 4.3927 ( 0.00%) 4.2653 ( 2.90%)
Amean 12 7.3923 ( 0.00%) 7.1427 ( 3.38%)
Amean 21 12.3733 ( 0.00%) 11.9760 ( 3.21%)
Amean 30 17.2617 ( 0.00%) 16.7987 ( 2.68%)
Amean 48 28.8577 ( 0.00%) 28.1980 ( 2.29%)
Amean 79 50.0687 ( 0.00%) 49.0887 ( 1.96%)
Amean 96 62.1603 ( 0.00%) 61.1177 ( 1.68%)

FYI.

After the performance tests, I added some code in the can_migrate_task() to count the different reasons why tasks could not be migrated during the hackbench run.


hit_delayed_dequeue_cnt: Hit if (p->se.sched_delayed) && (env->migration_type != migrate_load)


hit_ineligible_cnt: Did not hit "hit_delayed_dequeue" && !env->sd->nr_balance_failed && task_is_ineligible_on_dst_cpu()


Count results:

hit_delayed_dequeue_cnt 378432
hit_ineligible_cnt 1862099



Thanks,
Hao