Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in select_idle_cpu()

From: K Prateek Nayak
Date: Mon Sep 11 2023 - 17:29:49 EST


Hello Chenyu,

On 9/11/2023 8:20 AM, Chen Yu wrote:
> [..snip..]
> kernel/sched/fair.c | 30 +++++++++++++++++++++++++++---
> kernel/sched/features.h | 1 +
> kernel/sched/sched.h | 1 +
> 3 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e20f50726ab8..fe3b760c9654 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6629,6 +6629,21 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> hrtick_update(rq);
> now = sched_clock_cpu(cpu_of(rq));
> p->se.prev_sleep_time = task_sleep ? now : 0;
> +#ifdef CONFIG_SMP
> + /*
> + * If this rq will become idle, and dequeued task is
> + * a short sleeping one, check if we can reserve
> + * this idle CPU for that task for a short while.
> + * During this reservation period, other wakees will
> + * skip this 'idle' CPU in select_idle_cpu(), and this
> + * short sleeping task can pick its previous CPU in
> + * select_idle_sibling(), which brings better cache
> + * locality.
> + */
> + if (sched_feat(SIS_CACHE) && task_sleep && !rq->nr_running &&
> + p->se.sleep_avg && p->se.sleep_avg < sysctl_sched_migration_cost)
> + rq->cache_hot_timeout = now + p->se.sleep_avg;
> +#endif
> }
>
> #ifdef CONFIG_SMP
> @@ -6982,8 +6997,13 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
> static inline int __select_idle_cpu(int cpu, struct task_struct *p)
> {
> if ((available_idle_cpu(cpu) || sched_idle_cpu(cpu)) &&
> - sched_cpu_cookie_match(cpu_rq(cpu), p))
> + sched_cpu_cookie_match(cpu_rq(cpu), p)) {
> + if (sched_feat(SIS_CACHE) &&
> + sched_clock_cpu(cpu) < cpu_rq(cpu)->cache_hot_timeout)
> + return -1;

Just wondering,

Similar to how select_idle_core() caches the "idle_cpu" if it ends up
finding one in its search for an idle core, would returning a "cache-hot
idle CPU" be better than returning previous CPU / current CPU if all
idle CPUs found during the search in select_idle_cpu() are marked
cache-hot?

Speaking of cache-hot idle CPU, is netperf actually more happy with
piling on current CPU? I ask this because the logic seems to be
reserving the previous CPU for a task that dislikes migration but I
do not see anything in the wake_affine_idle() path that would make the
short sleeper proactively choose the previous CPU when the wakeup is
marked with the WF_SYNC flag. Let me know if I'm missing something?

To confirm this can you look at the trend in migration count with and
without the series? Also the ratio of cache-hot idle CPUs to number
of CPUs searched can help estimate overheads of additional search - I
presume SIS_UTIL is efficient at curbing the additional search in
a busy system.

In the meantime, I'll queue this series for testing on my end too.

> +
> return cpu;
> + }
>
> return -1;
> }
> [..snip..]


--
Thanks and Regards,
Prateek