Re: [PATCH] thermal/core: Emit a warning if the thermal zone is updated without ops

From: Lukasz Luba
Date: Tue Dec 08 2020 - 04:38:23 EST


Hi Daniel,

On 12/7/20 7:05 PM, Daniel Lezcano wrote:
The actual code is silently ignoring a thermal zone update when a
driver is requesting it without a get_temp ops set.

That looks not correct, as the caller should not have called this
function if the thermal zone is unable to read the temperature.

That makes the code less robust as the check won't detect the driver
is inconsistently using the thermal API and that does not help to
improve the framework as these circumvolutions hide the problem at the
source.

Make sense.


In order to detect the situation when it happens, let's add a warning
when the update is requested without the get_temp() ops set.

Any warning emitted will have to be fixed at the source of the
problem: the caller must not call thermal_zone_device_update if there
is not get_temp callback set.

As the check is done in thermal_zone_get_temperature() via the
update_temperature() function, it is pointless to have the check and
the WARN in the thermal_zone_device_update() function. Just remove the
check and let the next call to raise the warning.

Cc: Thara Gopinath <thara.gopinath@xxxxxxxxxx>
Cc: Amit Kucheria <amitk@xxxxxxxxxx>
Cc: linux-pm@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
---
drivers/thermal/thermal_core.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 90e38cc199f4..1bd23ff2247b 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -448,17 +448,17 @@ static void handle_thermal_trip(struct thermal_zone_device *tz, int trip)
monitor_thermal_zone(tz);
}
-static void update_temperature(struct thermal_zone_device *tz)
+static int update_temperature(struct thermal_zone_device *tz)
{
int temp, ret;
ret = thermal_zone_get_temp(tz, &temp);
if (ret) {
if (ret != -EAGAIN)
- dev_warn(&tz->device,
- "failed to read out thermal zone (%d)\n",
- ret);
- return;
+ dev_warn_once(&tz->device,
+ "failed to read out thermal zone (%d)\n",
+ ret);
+ return ret;
}
mutex_lock(&tz->lock);
@@ -469,6 +469,8 @@ static void update_temperature(struct thermal_zone_device *tz)
trace_thermal_temperature(tz);
thermal_genl_sampling_temp(tz->id, temp);
+
+ return 0;
}
static void thermal_zone_device_init(struct thermal_zone_device *tz)
@@ -553,11 +555,9 @@ void thermal_zone_device_update(struct thermal_zone_device *tz,
if (atomic_read(&in_suspend))
return;
- if (!tz->ops->get_temp)
+ if (update_temperature(tz))
return;
- update_temperature(tz);
-

I think the patch does a bit more. Previously we continued running the
code below even when the thermal_zone_get_temp() returned an error (due
to various reasons). Now we stop and probably would not schedule next
polling, not calling:
handle_thermal_trip() and monitor_thermal_zone()

I would left update_temperature(tz) as it was and not check the return.
The function thermal_zone_get_temp() can protect itself from missing
tz->ops->get_temp(), so we should be safe.

What do you think?

Regards,
Lukasz