Hi Armin, Werner,
On 1-Dec-24 6:58 PM, Armin Wolf wrote:
Am 27.11.24 um 12:21 schrieb Werner Sembach:
Hi,Hi,
Following up to https://lore.kernel.org/all/172b7acd-4313-4924-bcbc-41b73b39ada0@xxxxxxxxxxxxxxxxxxx/ and https://lore.kernel.org/all/f26d867e-f247-43bb-a78b-be0bce35c973@xxxxxxxxxxxx/ I experimented with the thermal subsystem and these are my results so far, but I'm hitting a bit of a wall:
As far as I can tell to implement "2. As long as GTMP is > 80°C fan speed must be at least 30%." I would need to add a new gevenor, lets call it "user_space_with_safeguards". I would be nice when the temp <-> fanspeed relation could be passed via the thermal_trip structure. And safeguarding the hardware from userspace only works when I can restrict userspace from just selecting the preexisting "user_space" govenor.
So my ideas/questions:
- Add a field "min_fanspeed_percent" to the thermal_trip struct that will only be used by the "user_space_with_safeguards" govenor
- Add a "user_space_with_safeguards" govenor that is the same as the "user_space" govenor, but on trip, a minimum speed is applied
- How can i ensure that on further speed updates the min speed is applied to? I could just implement it directly in the cdev, but that would be spagetti coding around the govenor.
- Can I somehow restrict userspace from using certain govenors?
- I'm a litte bit confused about the thermal zone "mode" sysfs switch, here it says deactivate for userspace control: https://elixir.bootlin.com/linux/v6.12/source/Documentation/ABI/testing/sysfs-class-thermal#L20, but what about the user_space govenor then?
i am having little experience with the thermal subsystem, but i suggest that policy decisions like "min_fanspeed_percent" should either:
- come from the hardware/firmware itself
- come from userspace
Effectively this driver tries to enforce a Tuxedo-specific policy that is not directly based on hardware limits. The book "Linux Device Drivers"
says:
"the role of a device driver is providing///mechanism/, not/policy/."
Furthermore:
"When/writing/ drivers, a programmer should pay particular attention to this fundamental concept: write kernel code to access the hardware,
but don't force particular policies on the user, since different users have different needs. The driver should deal with making the hardware
available, leaving all the issues about/how/ to use the hardware to the applications. A driver, then, is flexible if it offers access to the
hardware capabilities without adding constraints."
The issue is that the Tuxedo-specific policy is not directly connected with the hardware. Some hardware might need a bigger minimum fan speed than
other hardware for example.
But then the driver still needs to enforce them.
The hardware (or rather firmware in this case) should communicate those constraints to the linux driver so that we do not need to rely on random
temperatures for hardware protection.
This ACPI interface however basically provides us with a hwmon interface and Tuxedo now wants the kernel to enforce their policy on it. I suspect that
happens for warranty reasons, right?
That's no enforcement as userspace can just write the sysfs directly.
Maybe there is a way to enforce this policy through userspace applications?
An important role of device-drivers is also to avoid driving hardware outsideAck
its valid specifications. E.g. charger drivers in the power_supply subsystem
have constant_charge_current and constant_charge_voltage settings which
give a max charging speed in Amperes and a max charging Voltage (typically
4.2V for single Li-xxx cell batteries).
Setting these too high can be very dangerous leading to batteries catching
fire / exploding. so naturally the drivers also have and enforce
constant_charge_current_max and constant_charge_voltage_max sysfs attributes
and the values there do not reflect the maximum what the charger-chip can
handle, but the maximum which the manufacturer has specified for the battery.
This info can e.g. come from devicetree but if not present then the values
applied by the firmware at boot should be read back and used not only
as active, but also as max values.
I believe what Werner wants is pretty much the same thing, allow userspace
to provide its own fan management, but have the kernel monitor the temperature
and if the temperature goes say above 80° Celsius, enforce a min fan-speed of
30% and at 90° enforce a fan-speed of say 70%.
Yes the exact values are a for of policy, but this is a policy to protect
hw from getting damaged and as such is fine to have in the kernel.
As for how to implement this with the thermal subsystem (which indeed looks
like it might be the way to go) I'm not familiar enough with the thermal
subsystem either.
Werner, I suggest that you send an email to the thermal subsystem maintainers
and ask them about this. It would be good to start with explaining your
problem before asking the questions. I would leave out details about the ACPI
interface. Just say you have a temp-sensor + pwm to control fan speed combination
where you want to allow userspace control while having the kernel monitor
things and have the kernel overriding userspace's fan speed setting when
things get too hot, to deal with e.g. the userspace process crashing at
a low fan speed setting.
Best regards,
Regards,
Hans