Re: [RESEND PATCH] sched/fair: consider RT/IRQ pressure in select_idle_sibling
From: Joel Fernandes
Date: Mon Jan 29 2018 - 22:39:23 EST
Hi Rohit,
On Mon, Jan 29, 2018 at 3:27 PM, Rohit Jain <rohit.k.jain@xxxxxxxxxx> wrote:
> Currently fast path in the scheduler looks for an idle CPU to schedule
> threads on. Capacity is taken into account in the function
> 'select_task_rq_fair' when it calls 'wake_cap', however it ignores the
> instantaneous capacity and looks at the original capacity. Furthermore
> select_idle_sibling path of the code, ignores the RT/IRQ threads which
> are also running on the CPUs it is looking to schedule fair threads on.
>
> We don't necessarily have to force the code to go to slow path (by
> modifying wake_cap), instead we could do a better selection of the CPU
> in the current domain itself (i.e. in the fast path).
>
> This patch makes the fast path aware of capacity, resulting in overall
> performance improvements as shown in the test results.
>
[...]
>
> I also ran uperf and sysbench MySQL workloads but I see no statistically
> significant change.
>
> Signed-off-by: Rohit Jain <rohit.k.jain@xxxxxxxxxx>
> ---
> kernel/sched/fair.c | 38 ++++++++++++++++++++++++++++----------
> 1 file changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 26a71eb..ce5ccf8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5625,6 +5625,11 @@ static unsigned long capacity_orig_of(int cpu)
> return cpu_rq(cpu)->cpu_capacity_orig;
> }
>
> +static inline bool full_capacity(int cpu)
> +{
> + return capacity_of(cpu) >= (capacity_orig_of(cpu)*3)/4;
> +}
> +
> static unsigned long cpu_avg_load_per_task(int cpu)
> {
> struct rq *rq = cpu_rq(cpu);
> @@ -6081,7 +6086,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>
> for_each_cpu(cpu, cpu_smt_mask(core)) {
> cpumask_clear_cpu(cpu, cpus);
> - if (!idle_cpu(cpu))
> + if (!idle_cpu(cpu) || !full_capacity(cpu))
> idle = false;
> }
There's some difference in logic between select_idle_core and
select_idle_cpu as far as the full_capacity stuff you're adding goes.
In select_idle_core, if all CPUs are !full_capacity, you're returning
-1. But in select_idle_cpu you're returning the best idle CPU that's
the most cap among the !full_capacity ones. Why there is this
different in logic? Did I miss something?
>
> @@ -6102,7 +6107,8 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
> */
> static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
> {
> - int cpu;
> + int cpu, rcpu = -1;
> + unsigned long max_cap = 0;
>
> if (!static_branch_likely(&sched_smt_present))
> return -1;
> @@ -6110,11 +6116,13 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
> for_each_cpu(cpu, cpu_smt_mask(target)) {
> if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
> continue;
> - if (idle_cpu(cpu))
> - return cpu;
> + if (idle_cpu(cpu) && (capacity_of(cpu) > max_cap)) {
> + max_cap = capacity_of(cpu);
> + rcpu = cpu;
At the SMT level, do you need to bother with choosing best capacity
among threads? If RT is eating into one of the SMT thread's underlying
capacity, it would eat into the other's. Wondering what's the benefit
of doing this here.
thanks,
- Joel