Re: [PATCH 15/19] sched/fair: Respect LLC preference in task migration and detach

From: K Prateek Nayak

Date: Tue Oct 28 2025 - 23:54:14 EST


Hello Chenyu,

On 10/28/2025 5:28 PM, Chen, Yu C wrote:
> Hi Prateek,
>
> On 10/28/2025 2:02 PM, K Prateek Nayak wrote:
>> Hello Tim,
>>
>> On 10/11/2025 11:54 PM, Tim Chen wrote:
>>> @@ -9969,6 +9969,12 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>>>       if (env->flags & LBF_ACTIVE_LB)
>>>           return 1;
>>>   +#ifdef CONFIG_SCHED_CACHE
>>> +    if (sched_cache_enabled() &&
>>> +        can_migrate_llc_task(env->src_cpu, env->dst_cpu, p) == mig_forbid)
>>> +        return 0;
>>> +#endif
>>> +
>>>       degrades = migrate_degrades_locality(p, env);
>>>       if (!degrades)
>>>           hot = task_hot(p, env);
>>
>> Should we care for task_hot() w.r.t. migration cost if a task is being
>> moved to a preferred LLC?
>>
>
> This is a good question. The decision not to migrate a task when its
> LLC preference is violated takes priority over the check in task_hot().
>
> The main reason is that we want cache aware aggregation to be more
> aggressive than generic migration; otherwise, cache-aware migration
>  might not take effect according to our previous test. This seems to
> be a trade-off. Another consideration might be: should we consider
> the occupancy of a single thread or that of the entire process?
> For example, suppose t0, t1, and t2 belong to the same process. t0
> and t1 are running on the process's preferred LLC0, while t2 is
> running on the non-preferred LLC1. Even though t2 has high occupancy
> on LLC1 (making it cache-hot on LLC1), we might still want to move t2
> to LLC0 if t0, t1, and t2 read from and write to each other - since we don't want to generate cross-LLC access.

Makes sense. That would need some heuristics based on the avg_running
to know which LLC can be be a potential target with fewest migrations.
But then again, in a dynamic system things change so quickly - what
you have now seems to be a good start to further optimize on top of.

>
>> Also, should we leave out tasks under core scheduling from the llc
>> aware lb? Even discount them when calculating "mm->nr_running_avg"?
>>
> Yes, it seems that the cookie match check case was missed, which is
> embedded in task_hot(). I suppose you are referring to the p->core_cookie
> check; I'll look into this direction.

Yup! I think if user has opted into core scheduling, they should ideally
not bother about cache aware scheduling.

>
>>> @@ -10227,6 +10233,20 @@ static int detach_tasks(struct lb_env *env)
>>>           if (env->imbalance <= 0)
>>>               break;
>>>   +#ifdef CONFIG_SCHED_CACHE
>>> +        /*
>>> +         * Don't detach more tasks if the remaining tasks want
>>> +         * to stay. We know the remaining tasks all prefer the
>>> +         * current LLC, because after order_tasks_by_llc(), the
>>> +         * tasks that prefer the current LLC are at the tail of
>>> +         * the list. The inhibition of detachment is to avoid too
>>> +         * many tasks being migrated out of the preferred LLC.
>>> +         */
>>> +        if (sched_cache_enabled() && detached && p->preferred_llc != -1 &&
>>> +            llc_id(env->src_cpu) == p->preferred_llc)
>>> +            break;
>>
>> In all cases? Should we check can_migrate_llc() wrt to util migrated and
>> then make a call if we should move the preferred LLC tasks or not?
>>
>
> Prior to this "stop of detaching tasks", we performed a can_migrate_task(p)
> to determine if the detached p is dequeued from its preferred LLC, and in
> can_migrate_task(), we use can_migrate_llc_task() -> can_migrate_llc() to
> carry out the check. That is to say, only when certain tasks have been
> detached, will we stop further detaching.
>
>> Perhaps disallow it the first time if "nr_balance_failed" is 0 but
>> subsequent failed attempts should perhaps explore breaking the preferred
>> llc restriction if there is an imbalance and we are under
>> "mig_unrestricted" conditions.
>>
>
> I suppose you are suggesting that the threshold for stopping task detachment
> should be higher. With the above can_migrate_llc() check, I suppose we have
> raised the threshold for stopping "task detachment"?

Say the LLC is under heavy load and we only have overloaded groups.
can_migrate_llc() would return "mig_unrestricted" since
fits_llc_capacity() would return false.

Since we are under "migrate_load", sched_balance_find_src_rq() has
returned the CPU with the highest load which could very well be the
CPU with with a large number of preferred LLC tasks.

sched_cache_enabled() is still true and when detach_tasks() reaches
one of these preferred llc tasks (which comes at the very end of the
tasks list), we break out even if env->imbalance > 0 leaving
potential imbalance for the "migrate_load" case.

Instead, we can account for the util moved out of the src_llc and
after accounting for it, check if can_migrate_llc() would return
"mig_forbid" for the src llc.

--
Thanks and Regards,
Prateek