Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms

From: Viresh Kumar
Date: Mon Nov 23 2020 - 05:41:45 EST


On 20-11-20, 14:51, Lukasz Luba wrote:
> On 11/19/20 7:38 AM, Viresh Kumar wrote:
> > Scenario 1: The CPUs were mostly idle in the previous polling window of
> > the IPA governor as the tasks were sleeping and here are the details
> > from traces (load is in %):
> >
> > Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339
> > New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960
> >
> > Here, the "Old" line gives the load and requested_power (dynamic_power
> > here) numbers calculated using the idle time based implementation, while
> > "New" is based on the CPU utilization from scheduler.
> >
> > As can be clearly seen, the load and requested_power numbers are simply
> > incorrect in the idle time based approach and the numbers collected from
> > CPU's utilization are much closer to the reality.
>
> It is contradicting to what you have put in 'Scenario 1' description,
> isn't it?

At least I didn't think so when I wrote this and am still not sure :)

> Frequency at 1.2GHz, 75% total_load, power 4W... I'd say if CPUs were
> mostly idle than 1.3W would better reflect that state.

The CPUs were idle because the tasks were sleeping, but once the tasks
resume to work, we need a frequency that matches the real load of the
tasks. This is exactly what schedutil would ask for as well as it uses
the same metric and so we should be looking to ask for the same power
budget..

> What was the IPA period in your setup?

It is 100 ms by default, though I remember that I tried with 10 ms as
well.

> It depends on your platform IPA period (e.g. 100ms) and your current
> runqueues state (at that sampling point in time). The PELT decay/rise
> period is different. I am not sure if you observe the system avg load
> for last e.g. 100ms looking at these signals. Maybe IPA period is too
> short/long and couldn't catch up with PELT signals?
> But we won't too short averaging, since 16ms is a display tick.
>
> IMHO based on this result it looks like the util could lost older
> information from the past or didn't converge yet to this low load yet.
>
> >
> > Scenario 2: The CPUs were busy in the previous polling window of the IPA
> > governor:
> >
> > Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280
> > New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672
> >
> > As can be seen, the idle time based load is 100% for all the CPUs as it
> > took only the last window into account, but in reality the CPUs aren't
> > that loaded as shown by the utilization numbers.
>
> This is also odd. The ~88% of total_load, looks like started decaying or
> didn't converge yet to 100% or some task vanished?

They must have decayed a bit because of the idle period, so looks okay
that way.

--
viresh