Re: [PATCH v2 2/2] thermal: power_allocator: update once cooling devices when temp is low

From: Daniel Lezcano
Date: Tue Apr 20 2021 - 09:30:27 EST


On 19/04/2021 10:45, Lukasz Luba wrote:
> The cooling device state change generates an event, also when there is no
> need, because temperature is low and device is not throttled. Avoid to
> unnecessary update the cooling device which means also not sending event.
> The cooling device state has not changed because the temperature is still
> below the first activation trip point value, so we can do this.
> Add a tracking mechanism to make sure it updates cooling devices only
> once - when the temperature dropps below first trip point.
>
> Reported-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
> Signed-off-by: Lukasz Luba <lukasz.luba@xxxxxxx>
> ---
> drivers/thermal/gov_power_allocator.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
> index d393409fb786..f379f1aaa3b5 100644
> --- a/drivers/thermal/gov_power_allocator.c
> +++ b/drivers/thermal/gov_power_allocator.c
> @@ -571,7 +571,7 @@ static void reset_pid_controller(struct power_allocator_params *params)
> params->prev_err = 0;
> }
>
> -static void allow_maximum_power(struct thermal_zone_device *tz)
> +static void allow_maximum_power(struct thermal_zone_device *tz, bool update)
> {
> struct thermal_instance *instance;
> struct power_allocator_params *params = tz->governor_data;
> @@ -594,9 +594,13 @@ static void allow_maximum_power(struct thermal_zone_device *tz)
> */
> cdev->ops->get_requested_power(cdev, &req_power);
>
> - instance->cdev->updated = false;
> + if (update)
> + instance->cdev->updated = false;
> +
> mutex_unlock(&instance->cdev->lock);
> - (instance->cdev);
> +
> + if (update)
> + thermal_cdev_update(instance->cdev);

This cdev update has something bad IMHO. It is protected by a mutex but
the 'updated' field is left unprotected before calling
thermal_cdev_update().

It is not the fault of this code but how the cooling device are updated
and how it interacts with the thermal instances.

IMO, part of the core code needs to revisited.

This change tight a bit more the knot.

Would it make sense to you if we create a function eg.
__thermal_cdev_update()

And then we have:

void thermal_cdev_update(struct thermal_cooling_device *cdev)
{
mutex_lock(&cdev->lock);
/* cooling device is updated*/
if (cdev->updated) {
mutex_unlock(&cdev->lock);
return;
}

__thermal_cdev_update(cdev);

thermal_cdev_set_cur_state(cdev, target);

cdev->updated = true;
mutex_unlock(&cdev->lock);
trace_cdev_update(cdev, target);
dev_dbg(&cdev->device, "set to state %lu\n", target);
}

And in this file we do instead:

- instance->cdev->updated = false;
+ if (update)
+ __thermal_cdev_update(instance->cdev);
mutex_unlock(&instance->cdev->lock);
- thermal_cdev_update(instance->cdev);

> }
> mutex_unlock(&tz->lock);
> }
> @@ -710,6 +714,7 @@ static int power_allocator_throttle(struct thermal_zone_device *tz, int trip)
> int ret;
> int switch_on_temp, control_temp;
> struct power_allocator_params *params = tz->governor_data;
> + bool update;
>
> /*
> * We get called for every trip point but we only need to do
> @@ -721,9 +726,10 @@ static int power_allocator_throttle(struct thermal_zone_device *tz, int trip)
> ret = tz->ops->get_trip_temp(tz, params->trip_switch_on,
> &switch_on_temp);
> if (!ret && (tz->temperature < switch_on_temp)) {
> + update = (tz->last_temperature >= switch_on_temp);
> tz->passive = 0;
> reset_pid_controller(params);
> - allow_maximum_power(tz);
> + allow_maximum_power(tz, update);
> return 0;
> }
>
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog