Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing destination

From: Rik van Riel
Date: Wed May 13 2015 - 09:52:16 EST

On 05/13/2015 02:29 AM, Peter Zijlstra wrote:
> On Tue, May 12, 2015 at 11:45:09AM -0400, Rik van Riel wrote:
>> I have a few poorly formed ideas on what could be done about that:
>> 1) have fbq_classify_rq take the current task on the rq into account,
>> and adjust the fbq classification if all the runnable-but-queued
>> tasks are on the right node
> So while looking at this I came up with the below; it treats anything
> inside ->active_nodes as a preferred node for balancing purposes.
> Would that make sense?

Not necessarily.

If there are two workloads on a multi-threaded system, and they
have not yet converged on one node each, both nodes will be part
of ->active_nodes.

Treating them as preferred nodes means the load balancing code
would do nothing at all to help the workloads converge.

> I'll see what I can do about current in the runqueue type
> classification.

This can probably be racy, so just checking a value in the
current task struct for the runqueue should be ok. I am not
aware of any architecture where the task struct address can
become invalid. Worst thing that could happen is that the
bits examined change value.

>> 2) ensure that rq->nr_numa_running and rq->nr_preferred_running also
>> get incremented for kernel threads that are bound to a particular
>> CPU - currently CPU-bound kernel threads will cause the NUMA
>> statistics to look like a CPU has tasks that do not belong on that
>> NUMA node
> I'm thinking accounting those to nr_pinned, lemme see how that works
> out.


