Hi Rohit,
On Thu, Sep 28, 2017 at 8:09 AM, Rohit Jain <rohit.k.jain@xxxxxxxxxx> wrote:
[..]
I think that if there isn't a benefit in your tests in doing the aboveWith this case, because we know from the past avg, one of the strands isI know what you're trying to do but they way you've retrofitted it into
running low on capacity, I am trying to return a better strand for the
thread to start on.
the
core looks weird (to me) and makes the code unreadable and ugly IMO.
Why not do something simpler like skip the core if any SMT thread has been
running at lesser capacity? I'm not sure if this works great or if the
maintainers
will prefer your or my below approach, but I find the below diff much
cleaner
for the select_idle_core bit. It also makes more sense since resources are
shared at SMT level so makes sense to me to skip the core altogether for
this:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6ee7242dbe0a..f324a84e29f1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5738,14 +5738,17 @@ static int select_idle_core(struct task_struct *p,
struct sched_domain *sd, int
for_each_cpu_wrap(core, cpus, target) {
bool idle = true;
+ bool full_cap = true;
for_each_cpu(cpu, cpu_smt_mask(core)) {
cpumask_clear_cpu(cpu, cpus);
if (!idle_cpu(cpu))
idle = false;
+ if (!full_capacity(cpu))
+ full_cap = false;
}
- if (idle)
+ if (idle && full_cap)
return core;
}
Well, with your changes you will skip over fully idle cores which is not
an ideal thing either. I see that you were advocating for select
idle+lowest capacity core, whereas I was stopping at the first idlecore.
Since the whole philosophy till now in this patch is "Don't spare an
idle CPU", I think the following diff might look better to you. Please
note this is only for discussion sakes, I haven't fully tested it yet.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ec15e5f..c2933eb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6040,7 +6040,9 @@ void __update_idle_core(struct rq *rq)
static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
int target)
{
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
- int core, cpu;
+ int core, cpu, rcpu, backup_core;
+
+ rcpu = backup_core = -1;
if (!static_branch_likely(&sched_smt_present))
return -1;
@@ -6052,15 +6054,34 @@ static int select_idle_core(struct task_struct *p,
struct sched_domain *sd, int
for_each_cpu_wrap(core, cpus, target) {
bool idle = true;
+ bool full_cap = true;
for_each_cpu(cpu, cpu_smt_mask(core)) {
cpumask_clear_cpu(cpu, cpus);
if (!idle_cpu(cpu))
idle = false;
+
+ if (!full_capacity(cpu)) {
+ full_cap = false;
+ }
}
- if (idle)
+ if (idle && full_cap)
return core;
+ else if (idle && backup_core == -1)
+ backup_core = core;
+ }
+
+ if (backup_core != -1) {
+ for_each_cpu(cpu, cpu_smt_mask(backup_core)) {
+ if (full_capacity(cpu))
+ return cpu;
+ else if ((rcpu == -1) ||
+ (capacity_of(cpu) > capacity_of(rcpu)))
+ rcpu = cpu;
+ }
+
+ return rcpu;
}
Do let me know what you think.
vs the simpler approach, then I prefer the simpler approach especially
since there's no point/benefit in complicating the code for
select_idle_core.
thanks,
- Joel