Re: [PATCH] thermal: core: Delay exposing sysfs interface
From: Rafael J. Wysocki
Date: Wed Mar 12 2025 - 16:25:49 EST
On Sat, Mar 8, 2025 at 2:02 AM Lucas De Marchi <lucas.demarchi@xxxxxxxxx> wrote:
>
> There's a race between initializing the governor and userspace accessing
> the sysfs interface. From time to time the Intel graphics CI shows this
> signature:
>
> <1>[] #PF: error_code(0x0000) - not-present page
> <6>[] PGD 0 P4D 0
> <4>[] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> <4>[] CPU: 3 UID: 0 PID: 562 Comm: thermald Not tainted 6.14.0-rc4-CI_DRM_16208-g7e37396f86d8+ #1
> <4>[] Hardware name: Intel Corporation Twin Lake Client Platform/AlderLake-N LP5 RVP, BIOS TWLNFWI1.R00.5222.A01.2405290634 05/29/2024
> <4>[] RIP: 0010:policy_show+0x1a/0x40
>
> thermald tries to read the policy file between the sysfs files being
> created and the governor set by thermal_set_governor(), which causes the
> NULL pointer dereference.
>
> Similarly to the hwmon interface, delay exposing the sysfs files to when
> the governor is already set.
>
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13655
> Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
> ---
> The race window is not that big. I could reproduce it and confirm
> the fix by doing this:
>
> 1) Add a udelay() in thermal_zone_device_register_with_trips
> 2) A busy loop cat'ing the file
>
> $ while [ 1 ]; do
> cat /sys/devices/virtual/thermal/thermal_zone0/policy > /dev/null 2>&1
> done
> 3) rebind processor_thermal_device_pci
> ---
> drivers/thermal/thermal_core.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 2328ac0d8561b..f96ca27109288 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1589,26 +1589,26 @@ thermal_zone_device_register_with_trips(const char *type,
>
> tz->state = TZ_STATE_FLAG_INIT;
>
> + result = dev_set_name(&tz->device, "thermal_zone%d", tz->id);
> + if (result)
> + goto remove_id;
> +
> + thermal_zone_device_init(tz);
> +
> + result = thermal_zone_init_governor(tz);
> + if (result)
> + goto remove_id;
> +
> /* sys I/F */
> /* Add nodes that are always present via .groups */
> result = thermal_zone_create_device_groups(tz);
> if (result)
> goto remove_id;
>
> - result = dev_set_name(&tz->device, "thermal_zone%d", tz->id);
> - if (result) {
> - thermal_zone_destroy_device_groups(tz);
> - goto remove_id;
> - }
> - thermal_zone_device_init(tz);
> result = device_register(&tz->device);
> if (result)
> goto release_device;
>
> - result = thermal_zone_init_governor(tz);
> - if (result)
> - goto unregister;
> -
> if (!tz->tzp || !tz->tzp->no_hwmon) {
> result = thermal_add_hwmon_sysfs(tz);
> if (result)
>
> ---
Applied as 6.15 material, thanks!