Re: [PATCH v2 2/2] thermal: core: Add sanity check for polling_delay and passive_delay

From: Daniel Lezcano
Date: Mon Jul 08 2024 - 09:59:02 EST


On 08/07/2024 15:38, Rafael J. Wysocki wrote:
On Mon, Jul 8, 2024 at 2:12 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:

On 05/07/2024 21:46, Rafael J. Wysocki wrote:
From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>

If polling_delay is nonzero and passive_delay is 0, the thermal zone
will use polling except when tz->passive is nonzero, which does not make
sense.

Also if polling_delay is nonzero and passive_delay is greater than
polling_delay, the thermal zone temperature will be updated less often
when tz->passive is nonzero. This does not make sense either.

Ensure that none of the above will happen.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
---

v1 -> v2: The patch actually matches the changelog

---
drivers/thermal/thermal_core.c | 3 +++
1 file changed, 3 insertions(+)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -1440,6 +1440,9 @@ thermal_zone_device_register_with_trips(
td->threshold = INT_MAX;
}

+ if (polling_delay && (passive_delay > polling_delay || !passive_delay))
+ passive_delay = polling_delay;

Given this is a system misconfiguration, it would make more sense to
bail out with -EINVAL. Assigning a default value in the back of the
caller will never raise its attention and can make a bad configuration
staying for a long time.

This works except for the case mentioned below.

I think that passive_delay > polling_delay can trigger a -EINVAL, but
(polling_delay && !passive_delay) cannot do it because it is regarded
as a valid case as per the below.

Right I can see ATM only this as an illogic combination:

polling_delay && passive_delay &&
(polling_delay < passive_delay)

That said, there are configurations with a passive delay set to zero but
with a non zero polling delay. For instance, a thermal zone mitigated
with a fan, so active trip points are set. Another example is when there
is only critical trip points for a thermal zone.

Actually there are multiple combinations with delays value which may
look invalid but which are actually valid.

For example, a setup with polling_delay > 0, passive_delay = 0, active
trip points, cooling map to this active trips, passive trip points
without cooling map.

IMHO, it is better to do the configuration the system is asking for,
even if it sounds weird

Except that it doesn't work as expected because if passive_delay = 0,
polling is paused when tz->passive is set.

Yes, but as there is no cooling map, there is no governor action, thus tz->passive is never set. So we can have a passive polling equal to zero without being illegal as no passive mitigation will happen.

The passive delay is really there only if there is a passive cooling device mapped to a passive trip point.

The polling delay is in charge of mitigating the active cooling device like a fan. So it is possible to mix an active trip point to mitigate with a fan and then put at a higher temperature a passive trip point with a higher sampling resolution.

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog