Re: [patch 2/2] sched: Scale the nohz_tracker logic by making itper NUMA node

From: Pallipadi, Venkatesh
Date: Mon Dec 14 2009 - 17:33:00 EST


On Mon, 2009-12-14 at 14:21 -0800, Peter Zijlstra wrote:
> On Thu, 2009-12-10 at 17:27 -0800, venkatesh.pallipadi@xxxxxxxxx wrote:
> > Having one idle CPU doing the rebalancing for all the idle CPUs in
> > nohz mode does not scale well with increasing number of cores and
> > sockets. Make the nohz_tracker per NUMA node. This results in multiple
> > idle load balancing happening at NUMA node level and idle load balancer
> > only does the rebalance domain among all the other nohz CPUs in that
> > NUMA node.
> >
> > This addresses the below problem with the current nohz ilb logic
> > * The lone balancer may end up spending a lot of time doing the
> > * balancing on
> > behalf of nohz CPUs, especially with increasing number of sockets and
> > cores in the platform.
>
> If the purpose is to keep sockets idle, doing things per node doesn't
> seem like a fine plan, since we're having nodes <= socket machines these
> days.

The idea is to do idle balance only within the nodes.
Eg: 4 node (and 4 socket) system with each socket having 4 cores.
If there is a single active thread on such a system, say on socket 3.
Without this change we have 1 idle load balancer (which may be in socket
0) which has periodic ticks and remaining 14 cores will be tickless.
But this one idle load balancer does load balance on behalf of itself +
14 other idle cores.

With the change proposed in this patch, we will have 3 completely idle
nodes/sockets. We will not do load balance on these cores at all.
Remaining one active socket will have one idle load balancer, which when
needed will do idle load balancing on behalf of itself + 2 other idle
cores in that socket.

If there all sockets have atleast one busy core, then we may have more
than one idle load balancer, but each will only do idle load balance on
behalf of idle processors in its own node, so total idle load balance
will be same as now.

Thanks,
Venki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/