Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in select_idle_sibling code path

From: Rohit Jain
Date: Tue Sep 26 2017 - 15:45:32 EST


On 09/25/2017 11:53 PM, Joel Fernandes wrote:
Hi Rohit,

Just some comments:

Hi Joel,

Thanks for the comments.

On Mon, Sep 25, 2017 at 5:02 PM, Rohit Jain <rohit.k.jain@xxxxxxxxxx> wrote:
While looking for CPUs to place running tasks on, the scheduler
completely ignores the capacity stolen away by RT/IRQ tasks.

This patch fixes that.

Signed-off-by: Rohit Jain <rohit.k.jain@xxxxxxxxxx>
---
kernel/sched/fair.c | 54 ++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index afb701f..19ff2c3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6040,7 +6040,10 @@ void __update_idle_core(struct rq *rq)
static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
{
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
- int core, cpu;
+ int core, cpu, rcpu, rcpu_backup;
I would call rcpu_backup as backup_cpu.

OK


+ unsigned int backup_cap = 0;
+
+ rcpu = rcpu_backup = -1;

if (!static_branch_likely(&sched_smt_present))
return -1;
@@ -6057,10 +6060,20 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
cpumask_clear_cpu(cpu, cpus);
if (!idle_cpu(cpu))
idle = false;
+
+ if (full_capacity(cpu)) {
+ rcpu = cpu;
+ } else if ((rcpu == -1) && (capacity_of(cpu) > backup_cap)) {
+ backup_cap = capacity_of(cpu);
+ rcpu_backup = cpu;
+ }
Here you comparing capacity of different SMT threads.

}

- if (idle)
- return core;
+ if (idle) {
+ if (rcpu == -1)
+ return (rcpu_backup != -1 ? rcpu_backup : core);
+ return rcpu;
+ }

This didn't make much sense to me, here you are returning either an
SMT thread or a core. That doesn't make much of a difference because
SMT threads share the same capacity (SD_SHARE_CPUCAPACITY). I think
what you want to do is find out the capacity of a 'core', not an SMT
thread, and compare the capacity of different cores and consider the
one which has least RT/IRQ interference.

IIUC the capacities of each strand is scaled by IRQ and 'rt_avg' for that
'rq'. Now if the strand is idle now and gets an interrupt in the future,
the 'core' would look like:

ÂÂ +----+----+
ÂÂ | IÂ |ÂÂÂ |
ÂÂ | TÂ |ÂÂÂ |
ÂÂ +----+----+

(I -> Interrupt, T-> Thread we are trying to schedule).

whereas if the other strand on the core was taking interrupt the core
would look like:

ÂÂ +----+----+
ÂÂ | IÂ | TÂ |
ÂÂ |ÂÂÂ |ÂÂÂ |
ÂÂ +----+----+

With this case, because we know from the past avg, one of the strands is
running low on capacity, I am trying to return a better strand for the
thread to start on.


}

/*
@@ -6076,7 +6089,8 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
*/
static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
{
- int cpu;
+ int cpu, backup_cpu = -1;
+ unsigned int backup_cap = 0;

if (!static_branch_likely(&sched_smt_present))
return -1;
@@ -6084,11 +6098,17 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
for_each_cpu(cpu, cpu_smt_mask(target)) {
if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
continue;
- if (idle_cpu(cpu))
- return cpu;
+ if (idle_cpu(cpu)) {
+ if (full_capacity(cpu))
+ return cpu;
+ if (capacity_of(cpu) > backup_cap) {
+ backup_cap = capacity_of(cpu);
+ backup_cpu = cpu;
+ }
+ }
Same thing here, since SMT threads share the same underlying capacity,
is there any point in comparing the capacities of each SMT thread?

See above

Thanks,
Rohit


thanks,

- Joel

[...]