Re: [PATCH] sched/numa: Restore sched feature NUMA to its earlier avatar.

From: Ingo Molnar
Date: Thu Jul 09 2015 - 02:29:46 EST



* Rik van Riel <riel@xxxxxxxxxx> wrote:

> On 07/08/2015 09:56 AM, Ingo Molnar wrote:
> >
> > * Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> wrote:
> >
> >> In commit:8a9e62a "sched/numa: Prefer NUMA hotness over cache hotness"
> >> sched feature NUMA was always set to true. However this sched feature was
> >> suppose to be enabled on NUMA boxes only thro set_numabalancing_state().
> >>
> >> To get back to the above behaviour, bring back NUMA_FAVOUR_HIGHER feature.
> >
> > Three typos and a non-standard commit ID reference.
> >
> >> /*
> >> + * NUMA_FAVOUR_HIGHER will favor moving tasks towards nodes where a
> >> + * higher number of hinting faults are recorded during active load
> >> + * balancing. It will resist moving tasks towards nodes where a lower
> >> + * number of hinting faults have been recorded.
> >> */
> >> -SCHED_FEAT(NUMA, true)
> >> +SCHED_FEAT(NUMA_FAVOUR_HIGHER, true)
> >> #endif
> >>
> >
> > So the comment spells 'favor' American, the constant you introduce is British
> > spelling via 'FAVOUR'? Please use it consistently!
> >
> > Also, this name is totally non-intuitive.
> >
> > Make it something like NUMA_FAVOR_BUSY_NODES or so?
>
> It is not about relocating tasks to busier nodes. The scheduler still
> moves tasks from busier nodes to idler nodes.
>
> This code makes the scheduler more prone to move tasks from nodes where
> they have fewer NUMA faults, to nodes where they have more.
>
> Not sure what a good name would be to describe that...

So I find the patch, the description and the comments in the code conflicting and
confusing.

The patch does this:

@@ -5676,10 +5676,10 @@ static int migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
unsigned long src_faults, dst_faults;
int src_nid, dst_nid;

- if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
+ if (!sched_feat(NUMA) || !sched_feat(NUMA_FAVOUR_HIGHER))
return -1;

- if (!sched_feat(NUMA))
+ if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
return -1;

src_nid = cpu_to_node(env->src_cpu);


while the default for 'NUMA' is 0, 'NUMA_FAVOUR_HIGHER' is 1.

Which in itself is confusing: WTH do we have a generic switch called 'NUMA' and
then have it disabled?

Secondly, and more importantly, this patch is equivalent to adding this (for the
default case):

return -1;

i.e. it's in essence a revert of 8a9e62a!

And it provides no explanation whatsoever. Why did we do the original change
(8a9e62a) which was well argued but apparently broken in some fashion, and why do
we want to change it back now?

I.e. this patch sucks on multiple grounds, and 8a9e62a probably sucks as well. And
you added a Reviewed-by while you should have noticed at least 2-3 flaws in the
patch and its approach. Not good.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/