Re: [PATCH] sched/fair: optimize should_we_balance for higher SMT systems

From: Tim Chen
Date: Tue Sep 05 2023 - 15:31:22 EST

Next message: Hari Bathini: "[PATCH v2 2/3] vmcore: allow fadump to export vmcore even if is_kdump_kernel() is false"
Previous message: Linus Torvalds: "Re: [PATCH 0/2] KVM: x86/mmu: .change_pte() optimization in TDP MMU"
In reply to: tip-bot2 for Shrikanth Hegde: "[tip: sched/urgent] sched/fair: Optimize should_we_balance() for large SMT systems"
Next in thread: Shrikanth Hegde: "Re: [PATCH] sched/fair: optimize should_we_balance for higher SMT systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 2023-09-02 at 13:42 +0530, Shrikanth Hegde wrote:
>
>
> Fixes: b1bfeab9b002 ("sched/fair: Consider the idle state of the whole core for load balance")
> Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxxxxxxx>
> ---
> kernel/sched/fair.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0b7445cd5af9..6e31923293bb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6619,6 +6619,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> /* Working cpumask for: load_balance, load_balance_newidle. */
> static DEFINE_PER_CPU(cpumask_var_t, load_balance_mask);
> static DEFINE_PER_CPU(cpumask_var_t, select_rq_mask);
> +static DEFINE_PER_CPU(cpumask_var_t, should_we_balance_tmpmask);
>
> #ifdef CONFIG_NO_HZ_COMMON
>
> @@ -10913,6 +10914,7 @@ static int active_load_balance_cpu_stop(void *data);
>
> static int should_we_balance(struct lb_env *env)
> {
> + struct cpumask *swb_cpus = this_cpu_cpumask_var_ptr(should_we_balance_tmpmask);
> struct sched_group *sg = env->sd->groups;
> int cpu, idle_smt = -1;
>
> @@ -10936,8 +10938,9 @@ static int should_we_balance(struct lb_env *env)
> return 1;
> }
>
> + cpumask_copy(swb_cpus, group_balance_mask(sg));
> /* Try to find first idle CPU */
> - for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
> + for_each_cpu_and(cpu, swb_cpus, env->cpus) {
> if (!idle_cpu(cpu))
> continue;
>
> @@ -10949,6 +10952,14 @@ static int should_we_balance(struct lb_env *env)
> if (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && !is_core_idle(cpu)) {
> if (idle_smt == -1)
> idle_smt = cpu;
> + /*
> + * If the core is not idle, and first SMT sibling which is
> + * idle has been found, then its not needed to check other
> + * SMT siblings for idleness
> + */
> +#ifdef CONFIG_SCHED_SMT
> + cpumask_andnot(swb_cpus, swb_cpus, cpu_smt_mask(cpu));
> +#endif
> continue;
> }
>
> @@ -12914,6 +12925,8 @@ __init void init_sched_fair_class(void)
> for_each_possible_cpu(i) {
> zalloc_cpumask_var_node(&per_cpu(load_balance_mask, i), GFP_KERNEL, cpu_to_node(i));
> zalloc_cpumask_var_node(&per_cpu(select_rq_mask, i), GFP_KERNEL, cpu_to_node(i));
> + zalloc_cpumask_var_node(&per_cpu(should_we_balance_tmpmask, i),
> + GFP_KERNEL, cpu_to_node(i));

Shrianth,

Wonder if we can avoid allocating the
should_we_balance_tmpmask for SMT2 case to save memory
for system with large number of cores.

The new mask and logic I think is only needed for more than 2 threads in a core.

Tim
>
> #ifdef CONFIG_CFS_BANDWIDTH
> INIT_CSD(&cpu_rq(i)->cfsb_csd, __cfsb_csd_unthrottle, cpu_rq(i));
> --
> 2.31.1
>

Next message: Hari Bathini: "[PATCH v2 2/3] vmcore: allow fadump to export vmcore even if is_kdump_kernel() is false"
Previous message: Linus Torvalds: "Re: [PATCH 0/2] KVM: x86/mmu: .change_pte() optimization in TDP MMU"
In reply to: tip-bot2 for Shrikanth Hegde: "[tip: sched/urgent] sched/fair: Optimize should_we_balance() for large SMT systems"
Next in thread: Shrikanth Hegde: "Re: [PATCH] sched/fair: optimize should_we_balance for higher SMT systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]