Re: [PATCH v2] thermal:core:fix: Check return code of the ->get_max_state() callback

From: Eduardo Valentin
Date: Fri Nov 21 2014 - 13:08:45 EST



Lukasz,

On Tue, Nov 18, 2014 at 11:16:30AM +0100, Lukasz Majewski wrote:
> The return code from ->get_max_state() callback was not checked during
> binding cooling device to thermal zone device.
>
> Signed-off-by: Lukasz Majewski <l.majewski@xxxxxxxxxxx>
> ---
> Changes for v2:
> - It turned out that patches from 1 to 6 from v1 are not needed, since
> they either already solve the problem (like imx_thermal.c) or not use
> cpufreq as a thermal cooling device.
> - The patch 7 from v1 is also not needed since this patch on error exits
> this function without using max_state variable.
> - In thermal driver probe the cpufreq_cooling_register() method presence
> is crucial to evaluate if the thermal driver needs any actions with
> -EPROBE_DEFER.

Have you tried this patch with of-thermal based systems?

The above proposal works if the thermal driver is dealing with loading
cpu_cooling. But for of-thermal based drivers, the idea is to leave to
cpufreq code to load it.

As an example, I am taking the ti-soc-thermal, but we already have other
of-thermal based drivers. Booting with this patch ti-soc-thermal
(of-based boot) loads fine, but the cpu_cooling never gets bound to the
thermal zone.

The thing is that the bind may happen before cpufreq-dt code loads the
cpufreq driver, and when cpu_cooling is checking what is the max freq,
by using cpufreq table, it won't be able to do it, as there is no table.

While, without the patch, it will use wrong in the binding, but after
it gets bound, and cpufreq loads, the max will be used correctly.

And in this case, the system still works besides this bug. The reasoning
is because the max state comes from DT (2) and lower and upper wont be
equal to THERMAL_NO_LIMIT. Then, the following check will use the
parameter, instead of max_state:

cdev->ops->get_max_state(cdev, &max_state);

/* lower default 0, upper default max_state */
lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
upper = upper == THERMAL_NO_LIMIT ?
max_state : upper;

In summary, introducing this patch, although it fix a problem, will
introduce regressions, in of-thermal based drivers.

I believe, to have this fix, you need to provide a way to have probing
deferring also in cpu_cooling. That needs also the change in the cpufreq
driver, as I mentioned in the other thread.

Cheers,

> ---
> drivers/thermal/thermal_core.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 43b9070..8567929 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -928,7 +928,7 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
> struct thermal_zone_device *pos1;
> struct thermal_cooling_device *pos2;
> unsigned long max_state;
> - int result;
> + int result, ret;
>
> if (trip >= tz->trips || (trip < 0 && trip != THERMAL_TRIPS_NONE))
> return -EINVAL;
> @@ -945,7 +945,9 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
> if (tz != pos1 || cdev != pos2)
> return -EINVAL;
>
> - cdev->ops->get_max_state(cdev, &max_state);
> + ret = cdev->ops->get_max_state(cdev, &max_state);
> + if (ret)
> + return ret;
>
> /* lower default 0, upper default max_state */
> lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
> --
> 2.0.0.rc2
>

Attachment: signature.asc
Description: Digital signature