Re: Thermal driver with safeguards

From: Daniel Lezcano
Date: Fri Jan 10 2025 - 12:43:46 EST


On 10/01/2025 17:56, Werner Sembach wrote:
Hi Daniel,

Am 09.01.25 um 22:36 schrieb Daniel Lezcano:
On 02/12/2024 15:52, Werner Sembach wrote:
Hi,

given a pair of a temperature sensor and a fan, I want to implement a driver. that allows userspace to directly control the fan if it wants to. But have a minimum fan speed when certain high temperatures are reached to avoid crashes or hardware damage.

From the userspace, use directly the thermal-engine which is currently under development [1]. You can add your platform specific code in a plugin while the thermal engine will catch all the thermal events and pass them to it [2].

The thermal engine has a configuration file which will setup the thermal framework to be woken up at different temperatures.
That still requires to trust userspace/the user to not write dangerous values directly to sysfs?

No, it is not a trip point but temperature thresholds. So if the firmware defines trip points, the userspace can not change them.

Userspace thresholds are new : https://lwn.net/Articles/986009/

The thermal engine will be proposed for a distro package, so the platform support will be automatically supported.

Beside the trip points can be setup in the device to act on higher temperature.
As far as i can tell these trip points only notify userspace but you can't attach code executed in kernel to it.

[ ... ]

What is unclear is how the fan is managed. I suggest to have a look at pwm-fan.c in drivers/hwmon

I already looked at hwmon, but that basically just writes trough values from and to userspace and has no kernel side management of temperatures and fan speeds whatsoever.

IIUC, you request was about having the userspace to deal with a fan and the kernel to be a safe guard, so taking over the thermal management when the temperature is too high.

Obviously the monitored temperature must be for a device with a "slow" temperature motion, userspace temperature management is not suitable for fast temperature transitions.

The thermal engine can for example configure different temperatures, let's say: 43°C, 44°C, 46°C, 49°C and 54°C.

Then the DT describes additional trip points for mitigation, one trip point for mitigation could be enough (eg 80°C). One for "hot" to send to the thermal engine a notification about getting really high so it can do some userspace action like killing an application, and finally a "critical" trip point to shutdown the system.

The fan would be a cooling device with 0-100 values representing the speed in percentage. The trip point at 80°C would be associated with the fan with the <0, 100> cooling states.

The dynamic of the thermal management could be the following:

The temperature is changing and stays in the [35°C - 60°C] boundaries. The thermal engine receives the events at the different aforementioned temperatures and manage to act on the pwm fan via hwmon.

For any reason the temperature goes above 80°C, at this moment the kernel takes over the management and will increase/decrease the fan speed between the 0% - 100% limits until the temperature goes below the 80°C.

If it continues to increase and reaches the "hot" trip point, then an events is sent to the userspace which should take an action to reduce the temperature (kill the application, reduce the battery charge, drop the frame rates, etc ...).

If it continues to increase and reaches the "critical" trip point, then the system shuts down.

If the temperature decreases and goes below 80°C, then it returns to the normal state and the thermal engine can continue its work.

Does it make sense ?


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog