Re: [RESEND PATCH v5 2/2] sched/fair: Scan cluster before scanning LLC in wake-up path

From: Yicong Yang
Date: Thu Jul 21 2022 - 08:42:21 EST


On 2022/7/21 18:33, Peter Zijlstra wrote:
> On Thu, Jul 21, 2022 at 09:38:04PM +1200, Barry Song wrote:
>> On Wed, Jul 20, 2022 at 11:15 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>
>>> On Wed, Jul 20, 2022 at 04:11:50PM +0800, Yicong Yang wrote:
>>>> + /* TODO: Support SMT system with cluster topology */
>>>> + if (!sched_smt_active() && sd) {
>>>> + for_each_cpu_and(cpu, cpus, sched_domain_span(sd)) {
>>>
>>> So that's no SMT and no wrap iteration..
>>>
>>> Does something like this work?
>>>
>>> ---
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6437,6 +6437,30 @@ static int select_idle_cpu(struct task_s
>>> }
>>> }
>>>
>>> + if (IS_ENABLED(CONFIG_SCHED_CLUSTER) &&
>>> + static_branch_unlikely(&sched_cluster_active)) {
>>> + struct sched_domain *sdc = rcu_dereference(per_cpu(sd_cluster, target));
>>> + if (sdc) {
>>> + for_each_cpu_wrap(cpu, sched_domain_span(sdc), target + 1) {
>>> + if (!cpumask_test_cpu(cpu, cpus))
>>> + continue;
>>> +
>>> + if (has_idle_core) {
>>> + i = select_idle_core(p, cpu, cpus, &idle_cpu);
>>> + if ((unsigned int)i < nr_cpumask_bits)
>>> + return i;
>>> + } else {
>>> + if (--nr <= 0)
>>> + return -1;
>>> + idle_cpu = __select_idle_cpu(cpu, p);
>>> + if ((unsigned int)idle_cpu < nr_cpumask_bits)
>>> + break;
>>
>> Guess here it should be "return idle_cpu", but not "break". as "break"
>> will make us scan more
>> other cpus outside the cluster if we have found idle_cpu within the cluster.
>>

That can explain why the performance regress when underload.

>> Yicong,
>> Please test Peter's code with the above change.
>
> Indeed. Sorry for that.
>

The performance's still positive based on the tip/sched/core used in this patch's commit.
70fb5ccf2ebb ("sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg").

On numa 0:
tip/core patched
Hmean 1 345.89 ( 0.00%) 398.43 * 15.19%*
Hmean 2 697.77 ( 0.00%) 794.40 * 13.85%*
Hmean 4 1392.51 ( 0.00%) 1577.60 * 13.29%*
Hmean 8 2800.61 ( 0.00%) 3118.38 * 11.35%*
Hmean 16 5514.27 ( 0.00%) 6124.51 * 11.07%*
Hmean 32 10869.81 ( 0.00%) 10690.97 * -1.65%*
Hmean 64 8315.22 ( 0.00%) 8520.73 * 2.47%*
Hmean 128 6324.47 ( 0.00%) 7253.65 * 14.69%*

On numa 0-1:
tip/core patched
Hmean 1 348.68 ( 0.00%) 397.74 * 14.07%*
Hmean 2 693.57 ( 0.00%) 795.54 * 14.70%*
Hmean 4 1369.26 ( 0.00%) 1548.72 * 13.11%*
Hmean 8 2772.99 ( 0.00%) 3055.54 * 10.19%*
Hmean 16 4825.83 ( 0.00%) 5936.64 * 23.02%*
Hmean 32 10250.32 ( 0.00%) 11780.59 * 14.93%*
Hmean 64 16309.51 ( 0.00%) 19864.38 * 21.80%*
Hmean 128 13022.32 ( 0.00%) 16365.43 * 25.67%*
Hmean 256 11335.79 ( 0.00%) 13991.33 * 23.43%*

Hi Peter,

Do you want me to respin a v6 based on your change?

Thanks,
Yicong