Re: [PATCH V3] thermal/core/power_allocator: avoid thermal cdev can not be reset

From: Di Shen
Date: Thu May 25 2023 - 02:39:47 EST


On Fri, Apr 14, 2023 at 11:21 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
>
>
>
> On 4/14/23 16:06, Daniel Lezcano wrote:
> > On 14/04/2023 16:18, Lukasz Luba wrote:
> >>
> >>
> >> On 4/14/23 12:12, Daniel Lezcano wrote:
> >>> On 13/04/2023 10:40, Di Shen wrote:
> >>>> We have discussed this question in patch-v1:
> >>>> https://lore.kernel.org/all/f6aaa5f1-495d-a158-14d8-ddb2bffbd9c2@xxxxxxx/
> >>>>
> >>>> Simply put, we use the trip_temp in the Android System; set different
> >>>> trip_temp for thermal control of different scenarios.
> >>>
> >>> The changes are dealing with the trip points and trying to detect the
> >>> threshold. That part should be handled in the thermal core or thermal
> >>> trip side, not in the governor.
> >>>
> >>> AFAICT, if a trip point is changed, then the power allocator should
> >>> be reset, including the cdev state.
> >>>
> >>> It would be more convenient to add an ops to the governor ops
> >>> structure to reset the governor and then call it when a trip point is
> >>> changed in thermal_zone_set_trip()
> >>>
> >>>
> >>
> >> Sounds reasonable to have a proper API and fwk handling this corner
> >> case scenario.
> >> Although, if there is a need for a 'easy-to-backport' fix for IPA only,
> >> I agree with this patch, since it's straight forward to put in some
> >> Android kernel. We can later fix the framework to handle this properly.
> >> Anyway, both ways are OK to me.
> >
> > Unfortunately, we can not do the maintenance of the Linux kernel based
> > on an 'easy-to-backport' policy to Android.
> >
> > This patch could be applied from-list to Android as a hotfix. But for
> > Linux the fix should be more elaborated. One solution is to add a
> > 'reset' ops and call it from the trip point update function.
>
> Fair enough.
>
> >
> > Did you double check the issue is not impacting the other governors too ?
>
> No, unfortunately, I haven't checked other governors.

Hi Lukasz and Daniel,
I rethought about this issue, and have tried three ways to solve it.
Finally, I realized that the root cause might be the cdev->state
update and notify. We should get back to take cdev->state into
account.

The three ways:
1.From trips updating perspective:
As your suggestion,add an ops function for thermal_governor
structure,define it in IPA governor, and call it when trips are
changed by userspace(sysfs node).

2.From cdev->state updating perspective:
For example, for gov_power_allocator there are two branches reached to
__thermal_cdev_update.

power_allocator_trottle
|
allow_maximum_power()[gov_power_allocator.c]
|
__thermal_cdev_update()[thermal_helpers.c]<<<<<<<(1)
|
thermal_cdev_set_cur_state()
|
cdev->ops->set_cur_state()
|
thermal_notify_cdev_state_update()
|
.......


power_allocator_throttle()[gov_power_allocator.c]
|
allocate_power()
|
power_actor_set_power()
|
__thermal_cdev_update()[thermal_helpers.c]<<<<<<(2)
|
......

Add a variable last_state for thermal_cooling_device structure to
record the last target cooling state, and before
thermal_notify_cdev_state_update, determine whether the last_state is
equal to current state. If not equal, then
thermal_notify_cdev_state_update.

static void thermal_cdev_set_cur_state(struct thermal_cooling_device
*cdev,
~ unsigned long target)
{
if (cdev->ops->set_cur_state(cdev, target))
return;

~ if (cdev->last_state != target)
+ thermal_notify_cdev_state_update(cdev->id, target);
+
+ cdev->last_state = target;
+
thermal_cooling_device_stats_update(cdev, target);
}

In this way, it will only update and notify when the state is changed
which means there's no need to use update flag to make sure it updates
cdev->state only once.

It can avoid a lot of unnecessary notifications, not only when the
temperature drops below the first trip point(at this situation
cdev->state is always 0) but also when the policy is working.

3.Similar to the second method, but an easier one.
Actually, in the set_cur_state ops of every cooling device, it has
already checked whether the last cooling state is equal to current
value or not, and returns 0. Like cpufreq cooling device:
static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
unsigned long state)
{
//.......
/* Request state should be less than max_level */
if (state > cpufreq_cdev->max_level)
return -EINVAL;

/* Check if the old cooling action is same as new cooling
action */
if (cpufreq_cdev->cpufreq_state == state)
return 0; //return -EAGAIN;
}

What if return a non-zero value? 1 or -EAGAIN(means thy again)? Then
thermal_cdev_set_cur_state() in __thermal_cdev_update() can return
directly without update and notify.

I prefer method 3. Because there's no more new variable or function
compared to 1 and 2, and it can make code more brief.

Well, what do you think about the three ways? Look forward to your
comments. Thank you!

Best regards,
Di