Re: [PATCH 1/4] sched/topology: Store root domain CPU capacity sum

From: Dietmar Eggemann
Date: Thu Apr 09 2020 - 09:50:17 EST


On 08.04.20 19:03, Vincent Guittot wrote:
> On Wed, 8 Apr 2020 at 18:31, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>
>> On 08.04.20 14:29, Vincent Guittot wrote:
>>> On Wed, 8 Apr 2020 at 11:50, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>
>> [...]
>>
>>>> /**
>>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>>> index 8344757bba6e..74b0c0fa4b1b 100644
>>>> --- a/kernel/sched/topology.c
>>>> +++ b/kernel/sched/topology.c
>>>> @@ -2052,12 +2052,17 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>>>> /* Attach the domains */
>>>> rcu_read_lock();
>>>> for_each_cpu(i, cpu_map) {
>>>> + unsigned long cap = arch_scale_cpu_capacity(i);
>>>
>>> Why do you replace the use of rq->cpu_capacity_orig by
>>> arch_scale_cpu_capacity(i) ?
>>> There is nothing about this change in the commit message
>>
>> True. And I can change this back.
>>
>> It seems though that the solution is not sufficient because of the
>> 'rd->span &nsub cpu_active_mask' issue discussed under patch 2/4.
>>ap
>> But this remind me of another question I have.
>>
>> Currently we use arch_scale_cpu_capacity() more often (16 times) than
>> capacity_orig_of()/rq->cpu_capacity_orig .
>>
>> What's hindering us to remove rq->cpu_capacity_orig and the code around
>> it and solely rely on arch_scale_cpu_capacity()? I mean the arch
>> implementation should be fast.
>
> Or we can do the opposite and only use capacity_orig_of()/rq->cpu_capacity_orig.
>
> Is there a case where the max cpu capacity changes over time ? So I
> would prefer to use cpu_capacity_orig which is a field of scheduler
> instead of always calling an external arch specific function

I see. So far it only changes during startup.

And it looks like that asym_cpu_capacity_level() [topology.c] would fail
if we would use capacity_orig_of() instead of arch_scale_cpu_capacity().

post_init_entity_util_avg() [fair.c] and sugov_get_util()
[cpufreq_schedutil.c] would be temporarily off until
update_cpu_capacity() has updated cpu_rq(cpu)->cpu_capacity_orig.

compute_energy() [fair.c] is guarded by sched_energy_enabled() from
being used at startup.

scale_rt_capacity() could be changed in case we call it after the
cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(cpu) in
update_cpu_capacity().

The Energy Model (and CPUfreq cooling) code would need
capacity_orig_of() exported. arch_scale_cpu_capacity() currently is
exported via include/linux/sched/topology.h.

I guess Pelt and 'scale invariant Deadline bandwidth enforcement' should
continue using arch_scale_cpu_capacity() in sync with
arch_scale_freq_capacity().

IMHO it's hard to give clear advice when to use the one or the other.

We probably don't want to set cpu_rq(cpu)->cpu_capacity_orig in the arch
cpu scale setter. We have arch_scale_cpu_capacity() to decouple that.