Re: Thermal driver with safeguards
From: Daniel Lezcano
Date: Fri Jan 10 2025 - 12:43:46 EST
On 10/01/2025 17:56, Werner Sembach wrote:
Hi Daniel,
Am 09.01.25 um 22:36 schrieb Daniel Lezcano:
On 02/12/2024 15:52, Werner Sembach wrote:
Hi,
given a pair of a temperature sensor and a fan, I want to implement a
driver. that allows userspace to directly control the fan if it wants
to. But have a minimum fan speed when certain high temperatures are
reached to avoid crashes or hardware damage.
From the userspace, use directly the thermal-engine which is currently
under development [1]. You can add your platform specific code in a
plugin while the thermal engine will catch all the thermal events and
pass them to it [2].
The thermal engine has a configuration file which will setup the
thermal framework to be woken up at different temperatures.
That still requires to trust userspace/the user to not write dangerous
values directly to sysfs?
No, it is not a trip point but temperature thresholds. So if the
firmware defines trip points, the userspace can not change them.
Userspace thresholds are new : https://lwn.net/Articles/986009/
The thermal engine will be proposed for a distro package, so the
platform support will be automatically supported.
Beside the trip points can be setup in the device to act on higher
temperature.
As far as i can tell these trip points only notify userspace but you
can't attach code executed in kernel to it.
[ ... ]
What is unclear is how the fan is managed. I suggest to have a look at
pwm-fan.c in drivers/hwmon
I already looked at hwmon, but that basically just writes trough values
from and to userspace and has no kernel side management of temperatures
and fan speeds whatsoever.
IIUC, you request was about having the userspace to deal with a fan and
the kernel to be a safe guard, so taking over the thermal management
when the temperature is too high.
Obviously the monitored temperature must be for a device with a "slow"
temperature motion, userspace temperature management is not suitable for
fast temperature transitions.
The thermal engine can for example configure different temperatures,
let's say: 43°C, 44°C, 46°C, 49°C and 54°C.
Then the DT describes additional trip points for mitigation, one trip
point for mitigation could be enough (eg 80°C). One for "hot" to send to
the thermal engine a notification about getting really high so it can do
some userspace action like killing an application, and finally a
"critical" trip point to shutdown the system.
The fan would be a cooling device with 0-100 values representing the
speed in percentage. The trip point at 80°C would be associated with the
fan with the <0, 100> cooling states.
The dynamic of the thermal management could be the following:
The temperature is changing and stays in the [35°C - 60°C] boundaries.
The thermal engine receives the events at the different aforementioned
temperatures and manage to act on the pwm fan via hwmon.
For any reason the temperature goes above 80°C, at this moment the
kernel takes over the management and will increase/decrease the fan
speed between the 0% - 100% limits until the temperature goes below the
80°C.
If it continues to increase and reaches the "hot" trip point, then an
events is sent to the userspace which should take an action to reduce
the temperature (kill the application, reduce the battery charge, drop
the frame rates, etc ...).
If it continues to increase and reaches the "critical" trip point, then
the system shuts down.
If the temperature decreases and goes below 80°C, then it returns to the
normal state and the thermal engine can continue its work.
Does it make sense ?
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog