Re: [PATCH 2/5] thermal: devfreq_cooling: get a copy of device status

From: Ionela Voinescu
Date: Tue Dec 01 2020 - 09:56:49 EST


On Tuesday 01 Dec 2020 at 12:19:18 (+0000), Lukasz Luba wrote:
>
>
> On 12/1/20 10:36 AM, Ionela Voinescu wrote:
> > Hi,
> >
> > Sorry for the delay and for the noise on this older version. I first
> > want to understand the code better.
> >
> > On Thursday 22 Oct 2020 at 11:55:28 (+0100), Lukasz Luba wrote:
> > [..]
> > >
> > > >
> > > > > +{
> > > > > + /* Make some space if needed */
> > > > > + if (status->busy_time > 0xffff) {
> > > > > + status->busy_time >>= 10;
> > > > > + status->total_time >>= 10;
> > > > > + }
> > > >
> > > > How about removing the above code and adding here:
> > > >
> > > > status->busy_time = status->busy_time ? : 1;
> > >
> > > It's not equivalent. The code operates on raw device values, which
> > > might be big (e.g. read from counters). If it's lager than the 0xffff,
> > > it is going to be shifted to get smaller.
> > >
> >
> > Yes, the big values are handled below through the division and by making
> > total_time = 1024. These two initial checks are only to cover the
> > possibility for busy_time and total_time being 0, or busy_time >
> > total_time.
> >
> > > >
> > > > > +
> > > > > + if (status->busy_time > status->total_time)
> > > >
> > > > This check would then cover the possibility that total_time is 0.
> > > >
> > > > > + status->busy_time = status->total_time;
> > > >
> > > > But a reversal is needed here:
> > > > status->total_time = status->busy_time;
> > >
> > > No, I want to clamp the busy_time, which should not be bigger that
> > > total time. It could happen when we deal with 'raw' values from device
> > > counters.
> > >
> >
> > Yes, I understand. But isn't making total_time = busy_time accomplishing
> > the same thing?
> >
> > > >
> > > > > +
> > > > > + status->busy_time *= 100;
> > > > > + status->busy_time /= status->total_time ? : 1;
> > > > > +
> > > > > + /* Avoid division by 0 */
> > > > > + status->busy_time = status->busy_time ? : 1;
> > > > > + status->total_time = 100;
> > > >
> > > > Then all of this code can be replaced by:
> > > >
> > > > status->busy_time = (unsigned long)div64_u64((u64)status->busy_time << 10,
> > > > status->total_time);
> > > > status->total_time = 1 << 10;
> > >
> > > No, the total_time closed to 'unsigned long' would overflow.
> > >
> >
> > I'm not sure I understand. total_time gets a value of 1024, it's not
> > itself shifted by 10.
> >
> > > >
> > > > This way you gain some resolution to busy_time and the divisions in the
> > > > callers would just become shifts by 10.
> > >
> > >
> > > I don't want to gain more resolution here. I want to be prepare for raw
> > > (not processed yet) big values coming from driver.
> > >
> >
> > Agreed! The higher resolution is an extra benefit. The more important
> > benefit is that, through my suggestion, you'd be replacing all future
> > divisions by shifts.
>
> You have probably missed some bits.
> I don't see benefits, you have div64_u64() which is heavy on 32bit CPUs.
>
> Then, what is the range of these values:
> busy_time [0, 1024], total_time 1024 in your case.
> These values are used for estimating power in two cases:
> 1. in devfreq_cooling_get_requested_power()
> est_power = power * busy_time / total_time
> 2. in devfreq_cooling_power2state():
> est_power = power * total_time / busy_time
>
> As you can see above, the est_power values could overflow if total_time,
> busy_time are raw values (like in old implementation). So normalize them
> into 'some' scale. That was the motivation ('scale' motivation below).
>

Agreed! I do think scaling is necessary, but in my mind the [0, 1024] scale
made more sense.

> In your case you cannot avoid division in 2. use case, because busy_time
> can be any value in range [0, 1024].
> We could avoid the division in 1. use case, but load in cpufreq cooling
> is also in range of [0, 100], so this devfreq cooling is aligned. I
> would like to avoid situation when someone is parsing the traces
> and these two devices present different load scale.
>

Got it! Looking through the code I did overlook that 2 was reversed.

> I will think about better 'devfreq utilization' (as also Daniel
> suggested)in future, but first this EM must be in mainline and cpufreq
> cooling changes made by Viresh also there.
> But it would be more then just scale change to [0, 1024]...
>

Okay, looking forward to this. It would be nice to align all of these
utilization metrics in the future for all kinds of devices.

Thanks,
Ionela.