Re: [RFC 3/3] sched/topology: Different sched groups must not have the same balance cpu

From: Lauro Venancio
Date: Mon Apr 17 2017 - 11:34:20 EST


On 04/14/2017 01:49 PM, Peter Zijlstra wrote:
> On Thu, Apr 13, 2017 at 10:56:09AM -0300, Lauro Ramos Venancio wrote:
>> Currently, the group balance cpu is the groups's first CPU. But with
>> overlapping groups, two different groups can have the same first CPU.
>>
>> This patch uses the group mask to mark all the CPUs that have a
>> particular group as its main sched group. The group balance cpu is the
>> first group CPU that is also in the mask.
> Please give a NUMA configuration and CPU number where this goes wrong.
On a 4 nodes with ring topology, the groups (0-1,3 [cpu 0]), (0-2 [cpu
1]) and (0,2-3 [cpu 3]) share the same sched_group_capacity instance
when the first groups cpu is used to select the sgc.
>
> Because only the first group of a domain matters, and with the other
> thing fixed, I'm not immediately seeing where we go wobbly.

Before patch 2, the group balance cpu was implicitly used to select the
sched_group_capacity instance. When two different groups had the same
balance cpu, they shared the same sched_group_capacity instance.

After patch 2, one different sched_group_capacity instance is assigned
to each group instance.


This patch ensures tree things:

1) different instances of the same group share the same
sched_group_capacity instance.

2) instances of different groups don't share the same
sched_group_capacity instance.

3) the group balance cpu must be one of the cpus where the group is
installed.


I am rebasing this patch on top of your patches.