[PATCH v1] thermal: core: Do not fail cdev registration because of invalid initial state

From: Rafael J. Wysocki
Date: Wed Jun 05 2024 - 15:17:41 EST


From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>

It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling
device state to thermal_debug_cdev_add()") causes the ACPI fan driver
to fail probing on some systems which turns out to be due to the _FST
control method returning an invalid value until _FSL is first evaluated
for the given fan. If this happens, the .get_cur_state() cooling device
callback returns an error and __thermal_cooling_device_register() fails
as uses that callback after commit 31a0fa0019b0.

Arguably, _FST should not return an inavlid value even if it is
evaluated before _FSL, so this may be regarded as a platform firmware
issue, but at the same time it is not a good enough reason for failing
the cooling device registration where the initial cooling device state
is only needed to initialize a thermal debug facility.

Accordingly, modify __thermal_cooling_device_register() to pass a
negative state value to thermal_debug_cdev_add() instead of failing
if the initial .get_cur_state() callback invocation fails and adjust
the thermal debug code to ignore negative cooling device state values.

Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@xxxxxxxxxxxxx
Reported-by: Laura Nao <laura.nao@xxxxxxxxxxxxx>
Tested-by: Laura Nao <laura.nao@xxxxxxxxxxxxx>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
---
drivers/thermal/thermal_core.c | 11 +++++++----
drivers/thermal/thermal_debugfs.c | 7 ++++++-
2 files changed, 13 insertions(+), 5 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -964,7 +964,8 @@ __thermal_cooling_device_register(struct
{
struct thermal_cooling_device *cdev;
struct thermal_zone_device *pos = NULL;
- unsigned long current_state;
+ unsigned long val;
+ int current_state;
int id, ret;

if (!ops || !ops->get_max_state || !ops->get_cur_state ||
@@ -1002,9 +1003,11 @@ __thermal_cooling_device_register(struct
if (ret)
goto out_cdev_type;

- ret = cdev->ops->get_cur_state(cdev, &current_state);
- if (ret)
- goto out_cdev_type;
+ ret = cdev->ops->get_cur_state(cdev, &val);
+ if (!ret && val >= 0 && val <= INT_MAX)
+ current_state = val;
+ else
+ current_state = -1;

thermal_cooling_device_setup_sysfs(cdev);

Index: linux-pm/drivers/thermal/thermal_debugfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_debugfs.c
+++ linux-pm/drivers/thermal/thermal_debugfs.c
@@ -421,6 +421,8 @@ void thermal_debug_cdev_state_update(con
cdev_dbg = &thermal_dbg->cdev_dbg;

old_state = cdev_dbg->current_state;
+ if (old_state < 0)
+ goto unlock;

/*
* Get the old state information in the durations list. If
@@ -463,6 +465,7 @@ void thermal_debug_cdev_state_update(con

cdev_dbg->total++;

+unlock:
mutex_unlock(&thermal_dbg->lock);
}

@@ -499,7 +502,9 @@ void thermal_debug_cdev_add(struct therm
* duration will be printed by cdev_dt_seq_show() as expected if it
* runs before the first state transition.
*/
- thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations, state);
+ if (state >= 0)
+ thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations,
+ state);

debugfs_create_file("trans_table", 0400, thermal_dbg->d_top,
thermal_dbg, &tt_fops);