Re: [PATCH v2] thermal: Fix a NULL pointer dereference

From: Subbaraman Narayanamurthy
Date: Fri Sep 17 2021 - 16:06:58 EST


On 9/17/21 2:31 AM, Daniel Lezcano wrote:
> On 07/09/2021 21:01, Subbaraman Narayanamurthy wrote:
>> of_parse_thermal_zones() parses the thermal-zones node and registers a
>> thermal_zone device for each subnode. However, if a thermal zone is
>> consuming a thermal sensor and that thermal sensor device hasn't probed
>> yet, an attempt to set trip_point_*_temp for that thermal zone device
>> can cause a NULL pointer dereference. Fix it.
>>
>> console:/sys/class/thermal/thermal_zone87 # echo 120000 > trip_point_0_temp
>> ...
>> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
>> ...
>> Call trace:
>> of_thermal_set_trip_temp+0x40/0xc4
>> trip_point_temp_store+0xc0/0x1dc
>> dev_attr_store+0x38/0x88
>> sysfs_kf_write+0x64/0xc0
>> kernfs_fop_write_iter+0x108/0x1d0
>> vfs_write+0x2f4/0x368
>> ksys_write+0x7c/0xec
>> __arm64_sys_write+0x20/0x30
>> el0_svc_common.llvm.7279915941325364641+0xbc/0x1bc
>> do_el0_svc+0x28/0xa0
>> el0_svc+0x14/0x24
>> el0_sync_handler+0x88/0xec
>> el0_sync+0x1c0/0x200
>>
>> While at it, fix the possible NULL pointer dereference in other
>> functions as well: of_thermal_get_temp(), of_thermal_set_emul_temp(),
>> of_thermal_get_trend().
>>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Suggested-by: David Collins <quic_collinsd@xxxxxxxxxxx>
>> Signed-off-by: Subbaraman Narayanamurthy <quic_subbaram@xxxxxxxxxxx>
>> ---
>> Changes for v2:
>> - Added checks in of_thermal_get_temp(), of_thermal_set_emul_temp(), of_thermal_get_trend().
>>
>> drivers/thermal/thermal_of.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
>> index 6379f26..9233f7e 100644
>> --- a/drivers/thermal/thermal_of.c
>> +++ b/drivers/thermal/thermal_of.c
>> @@ -89,7 +89,7 @@ static int of_thermal_get_temp(struct thermal_zone_device *tz,
>> {
>> struct __thermal_zone *data = tz->devdata;
>>
>> - if (!data->ops->get_temp)
>> + if (!data->ops || !data->ops->get_temp)
> comment (1)
>
> AFAICT, if data->ops != NULL then data->ops->get_temp is also != NULL
> because of the code allocating/freeing the ops structure.
>
> The tests can be replaced by (!data->ops), no ?

Thanks Daniel for reviewing the patch.

I agree that even if a sensor module is unregistered, that would call
"thermal_zone_of_sensor_unregister" which would eventually set NULL on
get_temp() and get_trend() and "tzd->ops" as well.

However, of_thermal_get_temp() is trying to call "data->ops->get_temp"
which comes from a sensor driver when it registers. There is no
guarantee that it would be non-NULL right?

Thinking of which, I think having both checks would be valid.

>
>> return -EINVAL;
>>
>> return data->ops->get_temp(data->sensor_data, temp);
>> @@ -186,6 +186,9 @@ static int of_thermal_set_emul_temp(struct thermal_zone_device *tz,
>> {
>> struct __thermal_zone *data = tz->devdata;
>>
>> + if (!data->ops || !data->ops->set_emul_temp)
>> + return -EINVAL;
>> +
> comment (2)
>
> The test looks pointless here (I mean both of them).
>
> If of_thermal_set_emul_temp() is called it is because the callback was
> set in thermal_zone_of_add_sensor().
>
> This one does:
>
> tz->ops = ops;
>
> and
> if (ops->set_emul_temp)
> tzd->ops->set_emul_temp = of_thermal_set_emul_temp;
>
> If I'm not wrong if we are called, then data->ops &&
> data->ops->set_emul_temp is always true, right ?
>

I've not exercised this condition yet. However, the original problem
we've observed was when thermal HAL was trying to set trip thresholds
on a thermal zone device for which the sensor device is not probed yet.
This had happened randomly because of vendor modules taking time to be
loaded and probed. I don't know if there would be any userspace entity
that can try to set emulated temperature for a thermal zone even before
a sensor device is probed.

Without a sensor driver probed, "tz->ops" would not have a valid pointer
right? So, I think checking for "data->ops" should be good.

Another possibility is, a sensor might not have "set_emul_temp" callback.
So checking for "ops->set_emul_temp" should be still valid.

>> return data->ops->set_emul_temp(data->sensor_data, temp);
>> }
>>
>> @@ -194,7 +197,7 @@ static int of_thermal_get_trend(struct thermal_zone_device *tz, int trip,
>> {
>> struct __thermal_zone *data = tz->devdata;
>>
>> - if (!data->ops->get_trend)
>> + if (!data->ops || !data->ops->get_trend)
>> return -EINVAL;
> Same comment as (1)

of_thermal_get_trend() is trying to call "data->ops->get_trend" which
comes from a sensor driver when it registers. From what I can see,
there are lot of drivers which don't pass "get_trend" in their ops.
So there is no guarantee that it would be non-NULL right?

Thinking of which, I think having both checks would be valid.

>
>> return data->ops->get_trend(data->sensor_data, trip, trend);
>> @@ -301,7 +304,7 @@ static int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip,
>> if (trip >= data->ntrips || trip < 0)
>> return -EDOM;
>>
>> - if (data->ops->set_trip_temp) {
>> + if (data->ops && data->ops->set_trip_temp) {
> Same comment as (2)

Without a sensor driver probed, "tz->ops" would not have a valid pointer right?
So, I think checking for "data->ops" should be good. Another possibility is, a
sensor device might not have "set_trip_temp" callback but just "set_trips".

So checking for "data->ops->set_trip_temp" might be still valid.

>
>> int ret;
>>
>> ret = data->ops->set_trip_temp(data->sensor_data, trip, temp);
>>
>