Re: [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
From: Valentin Schneider
Date: Thu Apr 16 2020 - 11:27:46 EST
On 16/04/20 14:36, Vincent Guittot wrote:
>> Coming back to the v2 discussion on this patch
>>
>> https://lore.kernel.org/r/20200311181601.18314-10-valentin.schneider@xxxxxxx
>>
>> SD_BALANCE_WAKE is not used in mainline anymore, so wakeups are always
>> fast today.
>>
>> I.e. you wouldn't need a per_cpu(sd_balance_wake, cpu) since it's always
>> NULL.
>>
>> I.e. want_affine logic and the 'for_each_domain(cpu, tmp)' isn't needed
>> anymore.
>>
>> This will dramatically simplify the code in select_task_rq_fair().
>>
>> But I guess Vincent wants to keep the functionality so we're able to
>> enable SD_BALANCE_WAKE on certain sd's?
>
> I looked too quickly what was done by this patch. I thought that it
> was adding a per_cpu pointer for all cases including the fast path
> with wake affine but it only skips the for_each_domain loop for the
> slow paths which don't need it because they are already slow.
>
> It would be better to keep the for_each_domain loop for slow paths and
> to use a per_cpu pointer for fast_path/wake affine. Regarding the
> wake_affine path, we don't really care about looping all domains and
> we could directly use the highest domain because wake_affine() that is
> used in the loop, only uses the imbalance_pct of the sched domain for
> wake_affine_weight() and it should not harm to use only the highest
> domain and then select_idle_sibling doesn't use it but the llc or
> asym_capacity pointer instead.
So Dietmar's pointing out that sd will always be NULL for want_affine,
because want_affine can only be true at wakeups and we don't have any
topologies with SD_BALANCE_WAKE anymore. We would still want to call
wake_affine() though, because that can change new_cpu.
What you are adding on top is that the only sd field used in wake_affine()
is the imbalance_pct, so we could take a shortcut and just go for the
highest domain with SD_WAKE_AFFINE; i.e. something like this:
---
if (want_affine) {
// We can cache that at topology buildup
sd = highest_flag_domain(cpu, SD_WAKE_AFFINE);
if (cpumask_test_cpu(prev_cpu, sched_domain_span(sd) &&
cpu != prev_cpu)
new_cpu = wake_affine();
// Directly go to select_idle_sibling()
goto sis;
}
// !want_affine logic here
---
As for the !want_affine part, we could either keep the current domain walk
(IIUC this is what you are suggesting) or go for the extra cached pointers
I'm introducing.
Now if we are a bit bolder than that, because there are no more
(mainline) topologies w/ SD_BALANCE_WAKE, we could even turn the above
into:
---
if (wake_flags & WF_TTWU) {
if (want_affine) {
// We can cache that at topology buildup
sd = highest_flag_domain(cpu, SD_WAKE_AFFINE);
if (cpumask_test_cpu(prev_cpu, sched_domain_span(sd) &&
cpu != prev_cpu)
new_cpu = wake_affine();
}
// Directly go to select_idle_sibling()
goto sis;
}
// !want_affine logic here
---
This in turns mean we could get rid of SD_BALANCE_WAKE entirely... I'm a
bit more reluctant to that only because the last SD_BALANCE_WAKE setter was
removed fairly recently, see
a526d466798d ("sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems")