Re: [PATCH 2/5] thermal: devfreq_cooling: get a copy of device status

From: Lukasz Luba
Date: Tue Dec 01 2020 - 07:20:34 EST




On 12/1/20 10:36 AM, Ionela Voinescu wrote:
Hi,

Sorry for the delay and for the noise on this older version. I first
want to understand the code better.

On Thursday 22 Oct 2020 at 11:55:28 (+0100), Lukasz Luba wrote:
[..]


+{
+ /* Make some space if needed */
+ if (status->busy_time > 0xffff) {
+ status->busy_time >>= 10;
+ status->total_time >>= 10;
+ }

How about removing the above code and adding here:

status->busy_time = status->busy_time ? : 1;

It's not equivalent. The code operates on raw device values, which
might be big (e.g. read from counters). If it's lager than the 0xffff,
it is going to be shifted to get smaller.


Yes, the big values are handled below through the division and by making
total_time = 1024. These two initial checks are only to cover the
possibility for busy_time and total_time being 0, or busy_time >
total_time.


+
+ if (status->busy_time > status->total_time)

This check would then cover the possibility that total_time is 0.

+ status->busy_time = status->total_time;

But a reversal is needed here:
status->total_time = status->busy_time;

No, I want to clamp the busy_time, which should not be bigger that
total time. It could happen when we deal with 'raw' values from device
counters.


Yes, I understand. But isn't making total_time = busy_time accomplishing
the same thing?


+
+ status->busy_time *= 100;
+ status->busy_time /= status->total_time ? : 1;
+
+ /* Avoid division by 0 */
+ status->busy_time = status->busy_time ? : 1;
+ status->total_time = 100;

Then all of this code can be replaced by:

status->busy_time = (unsigned long)div64_u64((u64)status->busy_time << 10,
status->total_time);
status->total_time = 1 << 10;

No, the total_time closed to 'unsigned long' would overflow.


I'm not sure I understand. total_time gets a value of 1024, it's not
itself shifted by 10.


This way you gain some resolution to busy_time and the divisions in the
callers would just become shifts by 10.


I don't want to gain more resolution here. I want to be prepare for raw
(not processed yet) big values coming from driver.


Agreed! The higher resolution is an extra benefit. The more important
benefit is that, through my suggestion, you'd be replacing all future
divisions by shifts.

You have probably missed some bits.
I don't see benefits, you have div64_u64() which is heavy on 32bit CPUs.

Then, what is the range of these values:
busy_time [0, 1024], total_time 1024 in your case.
These values are used for estimating power in two cases:
1. in devfreq_cooling_get_requested_power()
est_power = power * busy_time / total_time
2. in devfreq_cooling_power2state():
est_power = power * total_time / busy_time

As you can see above, the est_power values could overflow if total_time,
busy_time are raw values (like in old implementation). So normalize them
into 'some' scale. That was the motivation ('scale' motivation below).

In your case you cannot avoid division in 2. use case, because busy_time
can be any value in range [0, 1024].
We could avoid the division in 1. use case, but load in cpufreq cooling
is also in range of [0, 100], so this devfreq cooling is aligned. I
would like to avoid situation when someone is parsing the traces
and these two devices present different load scale.

I will think about better 'devfreq utilization' (as also Daniel
suggested)in future, but first this EM must be in mainline and cpufreq
cooling changes made by Viresh also there.
But it would be more then just scale change to [0, 1024]...

Regards,
Lukasz



Thanks,
Ionela.

Regards,
Lukasz


Hope it helps,
Ionela.