Re: [Patch v9 7/8] sched/fair: Enable tuning of decay period

From: Dietmar Eggemann
Date: Wed Feb 19 2020 - 04:15:32 EST


On 18/02/2020 15:57, Thara Gopinath wrote:
>
>
> On 2/14/20 5:26 AM, Dietmar Eggemann wrote:
>> On 13/02/2020 14:54, Thara Gopinath wrote:
>>> On 02/10/2020 06:59 AM, Dietmar Eggemann wrote:
>>>> On 07/02/2020 23:42, Thara Gopinath wrote:
>>>>> On 02/04/2020 03:39 AM, Dietmar Eggemann wrote:
>>>>>> On 03/02/2020 16:55, Peter Zijlstra wrote:
>>>>>>> On Mon, Feb 03, 2020 at 07:07:57AM -0500, Thara Gopinath wrote:
>>>>>>>> On 01/28/2020 06:56 PM, Randy Dunlap wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 1/28/20 2:36 PM, Thara Gopinath wrote:

[...]

>>>> Cpu-invariant accounting can't be guarded with a kernel CONFIG switch.
>>>> Frequency-invariant accounting could be with CONFIG_CPU_FREQ but
>>>> this is
>>>> enabled by default by Arm64 defconfig.
>>>> Thermal pressure (accounting) (CONFIG_HAVE_SCHED_THERMAL_PRESSURE) is
>>>> disabled by default so why should a per-cpu thermal_pressure be
>>>> maintained on such a system (CONFIG_CPU_THERMAL=y by default)?
>>>
>>> I agree that there is no need for per-cpu thermal pressure to be
>>> maintained if no averaging is happening in the scheduler, today. I don't
>>> know if there will ever be an use for it.
>>
>> All arch_scale_FOO() functions follow the approach to force the arch
>> (currently x86, arm, arm64) to do
>>
>> #define arch_scale_FOO BAR
>>
>> to enable the FOO functionality.
>>
>> There is no direct link between consumer and provider here.
>>
>> Â consumer (sched) -> arch <- provider (arch, counters, CPUfreq, CPU
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ cooling, etc.)
>>
>> So IMHO, FOO=thermal_pressure should follow this design pattern too.
>>
>> 'thermal_pressure' would be the only one which can be disabled by a
>> kernel config switch at the consumer side.
>> IMHO, it doesn't make sense to have the provider operating in this case.
>>
>>> My issue has to do with using a config option meant for internal
>>> scheduler code being used else where. To me, once this happens, the
>>> entire work done to separate out reading and writing of instantaneous
>>> thermal pressure to arch_topology makes no sense. We could have kept it
>>> in scheduler itself.
>>
>> You might see thermal_pressure more on the level of irq_load or
>> [rt/dl]_rq_load and that could be why we have a different opinion here?
>>
>> Now rt_rq_load and dl_rq_load are scheduler internal providers and
>> irq_load is driven by 'irq_delta + steal' time (which is much closer to
>> the scheduler than thermal for instance).
>
> In this case, thermal pressure is quite close to scheduler as it reduces
> the maximum capacity available per cpu and hence affects scheduler
> placement of tasks
>
>>
>> My assumption is that we don't want a direct link between the scheduler
>> and e.g. a provider 'thermal'.
>
> Exactly. Which is why the same CONFIG option should not be used between
> the provider and consumer.

I think there is a little misunderstanding here. By being close to the
scheduler I was referring to rt, dl, irq which are not driven via an
arch_scale_FOO function.

But I guess we agree that FOO=thermal_pressure should use this
arch_scale_FOO function so we don't have a direct link between scheduler
and thermal subsystem.

We disagree in the point whether the provider should be present and
working when the only consumer is disabled by the kernel config.

I guess we can't discuss the technical angle of this issue any further
so maybe the maintainer of drivers/base/arch_topology.c should make a
decision (the actual code is in 3/8 of this patch-set).

>>> Another way I think about this whole thermal pressure framework is that
>>> it is the job of cooling device or cpufreq or any other entity to update
>>> a throttle in maximum pressure to the scheduler. It should be
>>> independent of what scheduler does with it. Scheduler can choose to
>>> ignore it