Re: [PATCH v2 15/23] sched/cache: Respect LLC preference in task migration and detach

From: Peter Zijlstra
Date: Wed Dec 10 2025 - 11:34:13 EST

Next message: Bjorn Helgaas: "Re: [PATCH RESEND v6 1/2] dmaengine: dw-edma: Add AMD MDB Endpoint Support"
Previous message: mr . nuke . me: "Re: [PATCH] wifi: ath11k: move .max_tx_ring to struct ath11k_hw_hal_params"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Dec 03, 2025 at 03:07:34PM -0800, Tim Chen wrote:

> @@ -10025,6 +10025,13 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> if (env->flags & LBF_ACTIVE_LB)
> return 1;
>
> +#ifdef CONFIG_SCHED_CACHE
> + if (sched_cache_enabled() &&
> + can_migrate_llc_task(env->src_cpu, env->dst_cpu, p) == mig_forbid &&
> + !task_has_sched_core(p))
> + return 0;
> +#endif

This seems wrong:
- it does not let nr_balance_failed override things;
- it takes precedence over migrate_degrade_locality(); you really want
to migrate towards the preferred NUMA node over staying on your LLC.

That is, this really wants to be done after migrate_degrades_locality()
and only if degrades == 0 or something.

> degrades = migrate_degrades_locality(p, env);
> if (!degrades)
> hot = task_hot(p, env);
> @@ -10146,12 +10153,55 @@ static struct list_head
> list_splice(&pref_old_llc, tasks);
> return tasks;
> }
> +
> +static bool stop_migrate_src_rq(struct task_struct *p,
> + struct lb_env *env,
> + int detached)
> +{
> + if (!sched_cache_enabled() || p->preferred_llc == -1 ||
> + cpus_share_cache(env->src_cpu, env->dst_cpu) ||
> + env->sd->nr_balance_failed)
> + return false;

But you are allowing nr_balance_failed to override things here.

> + /*
> + * Stop migration for the src_rq and pull from a
> + * different busy runqueue in the following cases:
> + *
> + * 1. Trying to migrate task to its preferred
> + * LLC, but the chosen task does not prefer dest
> + * LLC - case 3 in order_tasks_by_llc(). This violates
> + * the goal of migrate_llc_task. However, we should
> + * stop detaching only if some tasks have been detached
> + * and the imbalance has been mitigated.
> + *
> + * 2. Don't detach more tasks if the remaining tasks want
> + * to stay. We know the remaining tasks all prefer the
> + * current LLC, because after order_tasks_by_llc(), the
> + * tasks that prefer the current LLC are the least favored
> + * candidates to be migrated out.
> + */
> + if (env->migration_type == migrate_llc_task &&
> + detached && llc_id(env->dst_cpu) != p->preferred_llc)
> + return true;
> +
> + if (llc_id(env->src_cpu) == p->preferred_llc)
> + return true;
> +
> + return false;
> +}

Also, I think we have a problem with nr_balance_failed, cache_nice_tries
is 1 for SHARE_LLC; this means for failed=0 we ignore:

- ineligible tasks
- llc fail
- node-degrading / hot

and then the very next round, we do all of them at once, without much
grading.

> @@ -10205,6 +10255,15 @@ static int detach_tasks(struct lb_env *env)
>
> p = list_last_entry(tasks, struct task_struct, se.group_node);
>
> + /*
> + * Check if detaching current src_rq should be stopped, because
> + * doing so would break cache aware load balance. If we stop
> + * here, the env->flags has LBF_ALL_PINNED, which would cause
> + * the load balance to pull from another busy runqueue.

Uhh, can_migrate_task() will clear that ALL_PINNED thing if we've found
at least one task before getting here.

> + */
> + if (stop_migrate_src_rq(p, env, detached))
> + break;

Perhaps split cfs_tasks into multiple lists from the get-go? That avoids
this sorting.

Next message: Bjorn Helgaas: "Re: [PATCH RESEND v6 1/2] dmaengine: dw-edma: Add AMD MDB Endpoint Support"
Previous message: mr . nuke . me: "Re: [PATCH] wifi: ath11k: move .max_tx_ring to struct ath11k_hw_hal_params"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]