Re: [PATCH 4/4] sched/topology: the group balance cpu must be a cpu where the group is installed
From: Lauro Venancio
Date: Tue Apr 25 2017 - 11:56:35 EST
On 04/25/2017 12:39 PM, Peter Zijlstra wrote:
> On Tue, Apr 25, 2017 at 05:27:03PM +0200, Peter Zijlstra wrote:
>> On Tue, Apr 25, 2017 at 05:22:36PM +0200, Peter Zijlstra wrote:
>>> On Tue, Apr 25, 2017 at 05:12:00PM +0200, Peter Zijlstra wrote:
>>>> But I'll first try and figure out why I'm not having empty masks.
>>> Ah, so this is before all the degenerate stuff, so there's a bunch of
>>> redundant domains below that make it work -- and there always will be,
>>> unless FORCE_SD_OVERLAP.
>>>
>>> Now I wonder what triggered it.. let me put it back.
>> Ah! the asymmetric setup, where @sibling is entirely uninitialized for
>> the top domain.
>>
> And it still works correctly too:
>
>
> [ 0.078756] XXX 1 NUMA
> [ 0.079005] XXX 2 NUMA
> [ 0.080003] XXY 0-2:0
> [ 0.081007] XXX 1 NUMA
> [ 0.082005] XXX 2 NUMA
> [ 0.083003] XXY 1-3:3
> [ 0.084032] XXX 1 NUMA
> [ 0.085003] XXX 2 NUMA
> [ 0.086003] XXY 1-3:3
> [ 0.087015] XXX 1 NUMA
> [ 0.088003] XXX 2 NUMA
> [ 0.089002] XXY 0-2:0
>
>
> [ 0.090007] CPU0 attaching sched-domain:
> [ 0.091002] domain 0: span 0-2 level NUMA
> [ 0.092002] groups: 0 (mask: 0), 1, 2
> [ 0.093002] domain 1: span 0-3 level NUMA
> [ 0.094002] groups: 0-2 (mask: 0) (cpu_capacity: 3072), 1-3 (cpu_capacity: 3072)
> [ 0.095005] CPU1 attaching sched-domain:
> [ 0.096003] domain 0: span 0-3 level NUMA
> [ 0.097002] groups: 1 (mask: 1), 2, 3, 0
> [ 0.098004] CPU2 attaching sched-domain:
> [ 0.099002] domain 0: span 0-3 level NUMA
> [ 0.100002] groups: 2 (mask: 2), 3, 0, 1
> [ 0.101004] CPU3 attaching sched-domain:
> [ 0.102002] domain 0: span 1-3 level NUMA
> [ 0.103002] groups: 3 (mask: 3), 1, 2
> [ 0.104002] domain 1: span 0-3 level NUMA
> [ 0.105002] groups: 1-3 (mask: 3) (cpu_capacity: 3072), 0-2 (cpu_capacity: 3072)
>
>
> static void
> build_group_mask(struct sched_domain *sd, struct sched_group *sg, struct cpumask *mask)
> {
> const struct cpumask *sg_span = sched_group_cpus(sg);
> struct sd_data *sdd = sd->private;
> struct sched_domain *sibling;
> int i, funny = 0;
>
> cpumask_clear(mask);
>
> for_each_cpu(i, sg_span) {
> sibling = *per_cpu_ptr(sdd->sd, i);
>
> if (!sibling->child) {
> funny = 1;
> printk("XXX %d %s %*pbl\n", i, sd->name, cpumask_pr_args(sched_domain_span(sibling)));
> continue;
> }
>
> /* If we would not end up here, we can't continue from here */
> if (!cpumask_equal(sg_span, sched_domain_span(sibling->child)))
> continue;
>
> cpumask_set_cpu(i, mask);
> }
>
> if (funny) {
> printk("XXY %*pbl:%*pbl\n",
> cpumask_pr_args(sg_span),
> cpumask_pr_args(mask));
> }
> }
>
>
> So that will still get the right balance cpu and thus sgc.
>
> Another thing I've been thinking about; I think we can do away with the
> kzalloc() in build_group_from_child_sched_domain() and use the sdd->sg
> storage.
I considered this too. I decided to do not change this because I was not
sure if the kzalloc() was there for performance reasons. Currently, all
groups are allocated in the NUMA node they are used.
If we use sdd->sg storage, we may have groups allocated in one NUMA node
being used in another node.
>
> I just didn't want to move too much code around again, and ideally put
> more assertions in place to catch bad stuff; I just haven't had a good
> time thinking of good assertions :/