Re: [PATCH v1] thermal: core: Do not fail cdev registration because of invalid initial state

From: srinivas pandruvada
Date: Wed Jun 05 2024 - 23:41:47 EST


On Wed, 2024-06-05 at 21:17 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass
> cooling
> device state to thermal_debug_cdev_add()") causes the ACPI fan driver
> to fail probing on some systems which turns out to be due to the _FST
> control method returning an invalid value until _FSL is first
> evaluated
> for the given fan.  If this happens, the .get_cur_state() cooling
> device
> callback returns an error and __thermal_cooling_device_register()
> fails
> as uses that callback after commit 31a0fa0019b0.
>
> Arguably, _FST should not return an inavlid
s/inavlid/invalid

Thanks,
Srinivas

> value even if it is
> evaluated before _FSL, so this may be regarded as a platform firmware
> issue, but at the same time it is not a good enough reason for
> failing
> the cooling device registration where the initial cooling device
> state
> is only needed to initialize a thermal debug facility.
>
> Accordingly, modify __thermal_cooling_device_register() to pass a
> negative state value to thermal_debug_cdev_add() instead of failing
> if the initial .get_cur_state() callback invocation fails and adjust
> the thermal debug code to ignore negative cooling device state
> values.
>
> Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to
> thermal_debug_cdev_add()")
> Closes:
> https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@xxxxxxxxxxxxx
> Reported-by: Laura Nao <laura.nao@xxxxxxxxxxxxx>
> Tested-by: Laura Nao <laura.nao@xxxxxxxxxxxxx>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> ---
>  drivers/thermal/thermal_core.c    |   11 +++++++----
>  drivers/thermal/thermal_debugfs.c |    7 ++++++-
>  2 files changed, 13 insertions(+), 5 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -964,7 +964,8 @@ __thermal_cooling_device_register(struct
>  {
>   struct thermal_cooling_device *cdev;
>   struct thermal_zone_device *pos = NULL;
> - unsigned long current_state;
> + unsigned long val;
> + int current_state;
>   int id, ret;
>  
>   if (!ops || !ops->get_max_state || !ops->get_cur_state ||
> @@ -1002,9 +1003,11 @@ __thermal_cooling_device_register(struct
>   if (ret)
>   goto out_cdev_type;
>  
> - ret = cdev->ops->get_cur_state(cdev, &current_state);
> - if (ret)
> - goto out_cdev_type;
> + ret = cdev->ops->get_cur_state(cdev, &val);
> + if (!ret && val >= 0 && val <= INT_MAX)
> + current_state = val;
> + else
> + current_state = -1;
>  
>   thermal_cooling_device_setup_sysfs(cdev);
>  
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -421,6 +421,8 @@ void thermal_debug_cdev_state_update(con
>   cdev_dbg = &thermal_dbg->cdev_dbg;
>  
>   old_state = cdev_dbg->current_state;
> + if (old_state < 0)
> + goto unlock;
>  
>   /*
>   * Get the old state information in the durations list. If
> @@ -463,6 +465,7 @@ void thermal_debug_cdev_state_update(con
>  
>   cdev_dbg->total++;
>  
> +unlock:
>   mutex_unlock(&thermal_dbg->lock);
>  }
>  
> @@ -499,7 +502,9 @@ void thermal_debug_cdev_add(struct therm
>   * duration will be printed by cdev_dt_seq_show() as
> expected if it
>   * runs before the first state transition.
>   */
> - thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg-
> >durations, state);
> + if (state >= 0)
> + thermal_debugfs_cdev_record_get(thermal_dbg,
> cdev_dbg->durations,
> + state);
>  
>   debugfs_create_file("trans_table", 0400, thermal_dbg->d_top,
>       thermal_dbg, &tt_fops);
>
>
>
>