Re: [PATCH mainline-linux] thermal/core: Populate max_state before setting up cooling dev sysfs
From: Rafael J. Wysocki
Date: Wed May 20 2026 - 14:40:21 EST
On Wed, May 20, 2026 at 6:58 PM Ovidiu Panait
<ovidiu.panait.rb@xxxxxxxxxxx> wrote:
>
> Since commit 13f4e660a126 ("thermal/core:
> Split __thermal_cooling_device_register() into two functions"),
> thermal_cooling_device_setup_sysfs() is called before the
> ->get_max_state() callback in thermal_cooling_device_add().
> However, cooling_device_stats_setup() allocates space based on
> cdev->max_state, which is not initialized at that point.
>
> With CONFIG_THERMAL_STATISTICS=y, an out of bounds access happens
> inside thermal_cooling_device_stats_update(), followed by a kernel
> crash:
>
> Unable to handle kernel paging request at virtual address ffff800081329e60
> Call trace:
> queued_spin_lock_slowpath+0x1cc/0x320 (P)
> thermal_cooling_device_stats_update+0x28/0xa4
> __thermal_cdev_update+0x74/0x88
> thermal_cdev_update+0x44/0x58
> step_wise_manage+0x1b8/0x300
> __thermal_zone_device_update+0x270/0x414
> thermal_zone_device_check+0x28/0x40
> process_one_work+0x150/0x290
> worker_thread+0x18c/0x300
> kthread+0x114/0x120
> ret_from_fork+0x10/0x20
>
> To fix this, restore the original ordering of ->get_max_state() and
> thermal_cooling_device_setup_sysfs(). Note that with this reordering,
> the dev_set_name() and ->get_max_state() error paths now reach
> thermal_cdev_release() without setup_sysfs() having run. This is safe
> because cdev->stats is NULL in that case and destroy_sysfs() is a no-op.
>
> Fixes: 13f4e660a126 ("thermal/core: Split __thermal_cooling_device_register() into two functions")
> Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@xxxxxxxxxxx>
> ---
> drivers/thermal/thermal_core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 4e2a17fdb6a7..9b9fa51067bd 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1015,8 +1015,6 @@ static int thermal_cooling_device_add(struct thermal_cooling_device *cdev, void
> device_initialize(&cdev->device);
> cdev->devdata = devdata;
>
> - thermal_cooling_device_setup_sysfs(cdev);
> -
> ret = dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
> if (ret)
> goto out_put_device;
> @@ -1037,6 +1035,8 @@ static int thermal_cooling_device_add(struct thermal_cooling_device *cdev, void
> if (ret)
> current_state = ULONG_MAX;
>
> + thermal_cooling_device_setup_sysfs(cdev);
> +
> ret = device_add(&cdev->device);
> if (ret)
> goto out_put_device;
> --
The fixed commit is in linux-next only ATM, not in the mainline, but
the problem addressed by the patch is genuine AFAICS.
Applied, thanks!