Re: [PATCH v2 0/2] Add a generic virtual thermal sensor

From: Daniel Lezcano
Date: Wed Oct 06 2021 - 15:51:23 EST


On 06/10/2021 20:00, Rafael J. Wysocki wrote:
> On Wed, Oct 6, 2021 at 6:06 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
>>
>> On 05/10/2021 18:45, Rafael J. Wysocki wrote:
>>> On Mon, Oct 4, 2021 at 3:42 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
>>>>
>>>> On 04/10/2021 12:24, Alexandre Bailon wrote:
>>>>>
>>>>> On 9/22/21 10:10 AM, Daniel Lezcano wrote:
>>>>>> On 20/09/2021 15:12, Alexandre Bailon wrote:
>>>>>>> On 9/17/21 4:03 PM, Daniel Lezcano wrote:
>>>>>>>> On 17/09/2021 15:33, Alexandre Bailon wrote:
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> On 9/17/21 2:41 PM, Daniel Lezcano wrote:
>>>>>>>>>> On 17/09/2021 09:27, Alexandre Bailon wrote:
>>>>>>>>>>> This series add a virtual thermal sensor.
>>>>>>>>>>> It could be used to get a temperature using some thermal sensors.
>>>>>>>>>>> Currently, the supported operations are max, min and avg.
>>>>>>>>>>> The virtual sensor could be easily extended to support others
>>>>>>>>>>> operations.
>>>>>>>>>>>
>>>>>>>>>>> Note:
>>>>>>>>>>> Currently, thermal drivers must explicitly register their sensors to
>>>>>>>>>>> make them
>>>>>>>>>>> available to the virtual sensor.
>>>>>>>>>>> This doesn't seem a good solution to me and I think it would be
>>>>>>>>>>> preferable to
>>>>>>>>>>> update the framework to register the list of each available sensors.
>>>>>>>>>> Why must the drivers do that ?
>>>>>>>>> Because there are no central place where thermal sensor are
>>>>>>>>> registered.
>>>>>>>>> The only other way I found was to update thermal_of.c,
>>>>>>>>> to register the thermal sensors and make them available later to the
>>>>>>>>> virtual thermal sensor.
>>>>>>>>>
>>>>>>>>> To work, the virtual thermal need to get the sensor_data the ops from
>>>>>>>>> the thermal sensor.
>>>>>>>>> And as far I know, this is only registered in thermal_of.c, in the
>>>>>>>>> thermal zone data
>>>>>>>>> but I can't access it directly from the virtual thermal sensor.
>>>>>>>>>
>>>>>>>>> How would you do it ?
>>>>>>>> Via the phandles when registering the virtual sensor ?
>>>>>>> As far I know, we can't get the ops or the sensor_data from the phandle
>>>>>>> of a thermal sensor.
>>>>>>> The closest solution I found so far would be to aggregate the thermal
>>>>>>> zones instead of thermal sensors.
>>>>>>> thermal_zone_device has the data needed and a thermal zone could be find
>>>>>>> easily using its name.
>>>>>> Yeah, the concept of the thermal zone and the sensor are very close.
>>>>>>
>>>>>> There is the function in thermal_core.h:
>>>>>>
>>>>>> -> for_each_thermal_zone()
>>>>>>
>>>>>> You should be able for each 'slave' sensor, do a lookup to find the
>>>>>> corresponding thermal_zone_device_ops.
>>>>>>
>>>>>>> But, using a thermal_zone_device, I don't see how to handle module
>>>>>>> unloading.
>>>>>> I think try_module_get() / module_put() are adequate for this situation
>>>>>> as it is done on an external module and we can not rely on the exported
>>>>>> symbols.
>>>>> I don't see how it would be possible to use these functions.
>>>>> The thermal zone doesn't have the data required to use it.
>>>>
>>>> Actually I was able to crash the kernel by doing:
>>>>
>>>> console 1:
>>>>
>>>> while $(true); do insmod <module> && rmmod <module>; done
>>>>
>>>> console 2:
>>>>
>>>> while $(true); cat /sys/class/thermal/thermal_zone0/temp; done
>>>>
>>>> So there is something wrong already in the thermal framework.
>>>
>>> Hmmm.
>>>
>>>> IMO, the first thing would be to fix this critical issue by getting the
>>>> sensor module refcount when the thermal zone is enabled and dropping it
>>>> when it is disabled.
>>>>
>>>> With that fixed, perhaps it will possible to get the device associated
>>>> with the sensor and then try_module_get(dev->driver->owner)
>>>>
>>>>> Maybe a more easier way is to use the thermal_zone_device mutex.
>>>>> If I get a lock before to use the thermal_zone_device ops, I have the
>>>>> guaranty that module won't be unloaded.
>>>
>>> That would be my approach too.
>>
>> The mutex is private to the thermal core. The virtual sensor should not
>> touch it :/
>>
>> Perhaps, it can work with a private spin_lock with a try_spinlock() ?
>
> IIUC this is a case when module A refers to some memory in module B
> and if the latter goes away, an access to that memory from the former
> is a use-after-free, so it is not sufficient to use a local spinlock.

> This can be avoided by having a lock and a flag such that the flag is
> set under the lock by module B when making the memory in question
> available and cleared under the lock when freeing that memory. Then,
> module A needs to check the flag under the lock on every access to
> that memory. Also, the lock and the flag must be accessible all the
> time to both modules (ie. must not go away along with any of them if
> they don't depend on each other).

Just to clarify, the idea behind the virtual sensor is the same as the
network bridge: the network interfaces are not aware they are attached
to a bridge.

The virtual sensor should attach and detach the physical sensors.

The sensors should not un|register themselves from|to the virtual
sensor, nor having specific code related to the virtual sensor. De
facto, all the existing sensors will be compatible with the virtual sensor.

Do we have a solution other than grabbing a module reference and
self-contained to the virtual sensor ?

>>>>> When a "thermal of sensor" is unloaded, it calls
>>>>> thermal_zone_of_sensor_unregister which takes a lock before
>>>>> update ops.
>>>>
>>>> I'm not sure to understand. The goal is to have the refcount on the
>>>> modules to be incremented when the virtual sensor is using them.
>>>
>>> IMO the goal is to prevent the code from crashing when modules get
>>> unloaded. I'm not really sure if refcounts alone are sufficient for
>>> that.
>>
>> The problem is in the loop:
>>
>> +static int virtual_thermal_sensor_get_temp(void *data, int *temperature)
>> +{
>> + struct virtual_thermal_sensor *sensor = data;
>> + int max_temp = INT_MIN;
>> + int temp;
>> + int i;
>> +
>> + for (i = 0; i < sensor->count; i++) {
>> + struct thermal_sensor_data *hw_sensor;
>> +
>> + hw_sensor = &sensor->sensors[i];
>> + if (!hw_sensor->ops)
>> + return -ENODEV;
>> +
>> + hw_sensor->ops->get_temp(hw_sensor->sensor_data, &temp);
>> + max_temp = sensor->aggr_temp(max_temp, temp);
>> + }
>> +
>> + *temperature = max_temp;
>> +
>> + return 0;
>> +}
>>
>> If one of the sensor is unloaded when get_temp is called,
>> hw_sensor->ops->get_temp will crash.
>
> Right.
>
> Dereferencing hw_sensor itself is not safe in this loop if the module
> holding the memory pointed to by it may go away.
>
> However, presumably, the hw_sensor object needs to be registered with
> the core in order to be used here and unregistered when it goes away,
> so it looks like this loop could use a wrapper like
> thermal_get_sensor_temp(hw_sensor, &temp) which would return a
> negative error code if the sensor in question went away.
>
> Of course, that would require the core to check the list of available
> sensors every time, but that may be implemented efficiently with the
> help of an xarray, for example.
>
>> So the proposal is virtual_sensor_add_sensor() does try_get_module()
>> and virtual_sensor_remove_sensor() does put_module().
>>
>> The ref on the 'slave' modules will be release only if the virtual
>> sensor is unregistered.
>>
>> So until, the virtual sensor is unregistered, the 'slaves' modules can
>> not be unloaded.
>>
>> That is what we find with eg. the wifi modules.
>
> Yes, but that's a bit cumbersome from the sysadmin perspective IMO,
> especially when one wants to unload one of the modules and doesn't
> know exactly what other modules hold references to it.

Well, sysadmin are not supposed to unload a thermal sensor and I would
say if he unloads the module, which is not an usual operation, up to him
to 'lsmod' and check what module is holding the reference.

> IIUC ref_module() might be nicer, but it is still more convenient if
> the modules can just go away independently.

"for desesperate users" :)


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog