Re: [PATCH v2 15/23] sched/cache: Respect LLC preference in task migration and detach
From: Chen, Yu C
Date: Tue Dec 16 2025 - 02:33:10 EST
On 12/11/2025 12:30 AM, Peter Zijlstra wrote:
On Wed, Dec 03, 2025 at 03:07:34PM -0800, Tim Chen wrote:
@@ -10025,6 +10025,13 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
if (env->flags & LBF_ACTIVE_LB)
return 1;
+#ifdef CONFIG_SCHED_CACHE
+ if (sched_cache_enabled() &&
+ can_migrate_llc_task(env->src_cpu, env->dst_cpu, p) == mig_forbid &&
+ !task_has_sched_core(p))
+ return 0;
+#endif
This seems wrong:
- it does not let nr_balance_failed override things;
- it takes precedence over migrate_degrade_locality(); you really want
to migrate towards the preferred NUMA node over staying on your LLC.
That is, this really wants to be done after migrate_degrades_locality()
and only if degrades == 0 or something.
OK, will fix it.
degrades = migrate_degrades_locality(p, env);
if (!degrades)
hot = task_hot(p, env);
@@ -10146,12 +10153,55 @@ static struct list_head
list_splice(&pref_old_llc, tasks);
return tasks;
}
+
+static bool stop_migrate_src_rq(struct task_struct *p,
+ struct lb_env *env,
+ int detached)
+{
+ if (!sched_cache_enabled() || p->preferred_llc == -1 ||
+ cpus_share_cache(env->src_cpu, env->dst_cpu) ||
+ env->sd->nr_balance_failed)
+ return false;
But you are allowing nr_balance_failed to override things here.
+ /*
+ * Stop migration for the src_rq and pull from a
+ * different busy runqueue in the following cases:
+ *
+ * 1. Trying to migrate task to its preferred
+ * LLC, but the chosen task does not prefer dest
+ * LLC - case 3 in order_tasks_by_llc(). This violates
+ * the goal of migrate_llc_task. However, we should
+ * stop detaching only if some tasks have been detached
+ * and the imbalance has been mitigated.
+ *
+ * 2. Don't detach more tasks if the remaining tasks want
+ * to stay. We know the remaining tasks all prefer the
+ * current LLC, because after order_tasks_by_llc(), the
+ * tasks that prefer the current LLC are the least favored
+ * candidates to be migrated out.
+ */
+ if (env->migration_type == migrate_llc_task &&
+ detached && llc_id(env->dst_cpu) != p->preferred_llc)
+ return true;
+
+ if (llc_id(env->src_cpu) == p->preferred_llc)
+ return true;
+
+ return false;
+}
Also, I think we have a problem with nr_balance_failed, cache_nice_tries
is 1 for SHARE_LLC; this means for failed=0 we ignore:
- ineligible tasks
- llc fail
- node-degrading / hot
and then the very next round, we do all of them at once, without much
grading.
Do you mean we can set different thresholds for the different
scenarios you mentioned above, so as to avoid migrating tasks
at the same time in detach_tasks()?
For example,
ineligible tasks check:
if (env->sd->nr_balance_failed > env->sd->cache_nice_tries)
can_migrate;
llc fail check:
if (env->sd->nr_balance_failed > env->sd->cache_nice_tries + 1)
can_migrate;
node-degrading/hot check:
if (env->sd->nr_balance_failed > env->sd->cache_nice_tries + 2)
can_migrate;
@@ -10205,6 +10255,15 @@ static int detach_tasks(struct lb_env *env)
p = list_last_entry(tasks, struct task_struct, se.group_node);
+ /*
+ * Check if detaching current src_rq should be stopped, because
+ * doing so would break cache aware load balance. If we stop
+ * here, the env->flags has LBF_ALL_PINNED, which would cause
+ * the load balance to pull from another busy runqueue.
Uhh, can_migrate_task() will clear that ALL_PINNED thing if we've found
at least one task before getting here.
One problem is that, LBF_ALL_PINNED was cleared before
migrate_degrades_locality()/can_migrate_llc_task() in detach_tasks().
I suppose we want to keep LBF_ALL_PINNED() if can_migrate_llc_task(break
llc locality) failed.
+ */
+ if (stop_migrate_src_rq(p, env, detached))
+ break;
Perhaps split cfs_tasks into multiple lists from the get-go? That avoids
this sorting.
Will check with Tim on this.
thanks,
Chenyu