Re: [PATCH v4 1/2] sched: Add per_cpu cluster domain info and cpus_share_resources API

From: Yicong Yang
Date: Thu Jun 16 2022 - 04:10:59 EST


On 2022/6/15 23:43, Gautham R. Shenoy wrote:
> On Wed, Jun 15, 2022 at 07:49:22PM +0530, K Prateek Nayak wrote:
>
> [..snip..]
>
>>
>> - Bisecting:
>>
>> When we ran the tests with only Patch 1 of the series, the
>> regression was visible and the numbers were worse.
>>
>> Clients: tip cluster Patch 1 Only
>> 8 3263.81 (0.00 pct) 3086.81 (-5.42 pct) 3018.63 (-7.51 pct)
>> 16 6011.19 (0.00 pct) 5360.28 (-10.82 pct) 4869.26 (-18.99 pct)
>> 32 12058.31 (0.00 pct) 8769.08 (-27.27 pct) 8159.60 (-32.33 pct)
>> 64 21258.21 (0.00 pct) 19021.09 (-10.52 pct) 13161.92 (-38.08 pct)
>>
>> We further bisected the hunks to narrow down the cause to the per CPU
>> variable declarations.
>>
>>
>>>
>>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>>> index 01259611beb9..b9bcfcf8d14d 100644
>>> --- a/kernel/sched/sched.h
>>> +++ b/kernel/sched/sched.h
>>> @@ -1753,7 +1753,9 @@ static inline struct sched_domain *lowest_flag_domain(int cpu, int flag)
>>> DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc);
>>> DECLARE_PER_CPU(int, sd_llc_size);
>>> DECLARE_PER_CPU(int, sd_llc_id);
>>> +DECLARE_PER_CPU(int, sd_share_id);
>>> DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
>>> +DECLARE_PER_CPU(struct sched_domain __rcu *, sd_cluster);
>>
>> The main reason for the regression seems to be the above declarations.
>
> I think you meant that the regressions are due to the DEFINE_PER_CPU()
> instances from the following hunk:
>
>>> @@ -664,6 +664,8 @@ static void destroy_sched_domains(struct sched_domain *sd)
>>> DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc);
>>> DEFINE_PER_CPU(int, sd_llc_size);
>>> DEFINE_PER_CPU(int, sd_llc_id);
>>> +DEFINE_PER_CPU(int, sd_share_id);
>>> +DEFINE_PER_CPU(struct sched_domain __rcu *, sd_cluster);
>>> DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
>>> DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
>>> DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
>>>
>
>
> The System.map diff for these variables between tip vs tip +
> cluster-sched-v4 on your test system looks as follows:
>
> 0000000000020520 D sd_asym_packing
> 0000000000020528 D sd_numa
> -0000000000020530 D sd_llc_shared
> -0000000000020538 D sd_llc_id
> -000000000002053c D sd_llc_size
> -0000000000020540 D sd_llc
> +0000000000020530 D sd_cluster
> +0000000000020538 D sd_llc_shared

looks like below are in another cacheline (for 64B cacheline)?
while previous sd_llc_id and sd_llc_shared are in the same.

> +0000000000020540 D sd_share_id
> +0000000000020544 D sd_llc_id
> +0000000000020548 D sd_llc_size
> +0000000000020550 D sd_llc
>
> The allocations are in the reverse-order of the definitions.
>
> That perhaps explains why you no longer see the regression when you
> define the sd_share_id and sd_cluster per-cpu definitions at the
> beginning as indicated by the following
>
>> - Move the declarations of sd_share_id and sd_cluster to the top
>>
>> Clients: tip Patch 1 Patch 1 (Declarion on Top)
>> 8 3255.69 (0.00 pct) 3018.63 (-7.28 pct) 3072.30 (-5.63 pct)
>> 16 6092.67 (0.00 pct) 4869.26 (-20.08 pct) 5586.59 (-8.30 pct)
>> 32 11156.56 (0.00 pct) 8159.60 (-26.86 pct) 11184.17 (0.24 pct)
>> 64 21019.97 (0.00 pct) 13161.92 (-37.38 pct) 20289.70 (-3.47 pct)
>
>
> --
> Thanks and Regards
> gautham.
> .
>