Re: [PATCH v3] thermal/core: Clear all mitigation when thermal zone is disabled

From: Manaf Meethalavalappu Pallikunhi
Date: Mon Jan 10 2022 - 15:46:06 EST


Hi Thara,

On 1/10/2022 11:25 PM, Thara Gopinath wrote:
Hi Manaf,

On 1/7/22 1:56 PM, Manaf Meethalavalappu Pallikunhi wrote:
Whenever a thermal zone is in trip violated state, there is a chance
that the same thermal zone mode can be disabled either via thermal
core API or via thermal zone sysfs. Once it is disabled, the framework
bails out any re-evaluation of thermal zone. It leads to a case where
if it is already in mitigation state, it will stay the same state
until it is re-enabled.

To avoid above mentioned issue, on thermal zone disable request
reset thermal zone and clear mitigation for each trip explicitly.

Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@xxxxxxxxxxx>
---
  drivers/thermal/thermal_core.c | 12 ++++++++++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 51374f4..e288c82 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -447,10 +447,18 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
        thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
  -    if (mode == THERMAL_DEVICE_ENABLED)
+    if (mode == THERMAL_DEVICE_ENABLED) {
          thermal_notify_tz_enable(tz->id);
-    else
+    } else {
+        int trip;
+
+        /* make sure all previous throttlings are cleared */
+        thermal_zone_device_init(tz);

It looks weird to do a init when you are actually disabling the thermal zone.


+        for (trip = 0; trip < tz->trips; trip++)
+            handle_thermal_trip(tz, trip);

So this is exactly what thermal_zone_device_update does except that thermal_zone_device_update checks for the mode and bails out if the zone is disabled.
This will work because as you explained in v2, the temperature is reset in thermal_zone_device_init and handle_thermal_trip will remove the mitigation if any.

My two cents here (Rafael and Daniel can comment more on this).

I think it will be cleaner if we can have a third mode THERMAL_DEVICE_DISABLING and have thermal_zone_device_update handle clearing the mitigation. So this will look like
if (mode == THERMAL_DEVICE_DISABLED)
    tz->mode = THERMAL_DEVICE_DISABLING;
else
    tz->mode = mode;

thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);

if (mode == THERMAL_DEVICE_DISABLED)
    tz->mode = mode;

You will have to update update_temperature to set tz->temperature = THERMAL_TEMP_INVALID and thermal_zone_set_trips to set tz->prev_low_trip = -INT_MAX and tz->prev_high_trip = INT_MAX for
THERMAL_DEVICE_DISABLING mode.

I think just updating above fields doesn't guarantee complete clearing of mitigation for all governors. For  step_wise governor, to make sure mitigation removed completely, we have to set each thermal-instance->initialized = false as well.

If we add that to above list of variables in update_temperature() under if (mode == THERMAL_DEVICE_DISABLING) , it is same as thermal_zone_device_init function does in current patch. We are just resetting same fields in different place under a new mode, right ?

Thanks,

Manaf