Re: [PATCH] thermal/core: Introduce user trip points

From: Rafael J. Wysocki
Date: Tue Jul 02 2024 - 06:22:45 EST


On Tue, Jul 2, 2024 at 11:29 AM Daniel Lezcano
<daniel.lezcano@xxxxxxxxxx> wrote:
>
> On 01/07/2024 18:26, Rob Herring wrote:
> > On Thu, Jun 27, 2024 at 10:54:50AM +0200, Daniel Lezcano wrote:
> >> Currently the thermal framework has 4 trip point types:
> >>
> >> - active : basically for fans (or anything requiring energy to cool
> >> down)
> >>
> >> - passive : a performance limiter
> >>
> >> - hot : for a last action before reaching critical
> >>
> >> - critical : a without return threshold leading to a system shutdown
> >>
> >> A thermal zone monitors the temperature regarding these trip
> >> points. The old way to do that is actively polling the temperature
> >> which is very bad for embedded systems, especially mobile and it is
> >> even worse today as we can have more than fifty thermal zones. The
> >> modern way is to rely on the driver to send an interrupt when the trip
> >> points are crossed, so the system can sleep while the temperature
> >> monitoring is offloaded to a dedicated hardware.
> >>
> >> However, the thermal aspect is also managed from userspace to protect
> >> the user, especially tracking down the skin temperature sensor. The
> >> logic is more complex than what we found in the kernel because it
> >> needs multiple sources indicating the thermal situation of the entire
> >> system.
> >>
> >> For this reason it needs to setup trip points at different levels in
> >> order to get informed about what is going on with some thermal zones
> >> when running some specific application.
> >>
> >> For instance, the skin temperature must be limited to 43°C on a long
> >> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
> >>
> >> The thermal engine must then rely on trip points to monitor those
> >> temperatures. Unfortunately, today there is only 'active' and
> >> 'passive' trip points which has a specific meaning for the kernel, not
> >> the userspace. That leads to hacks in different platforms for mobile
> >> and embedded systems where 'active' trip points are used to send
> >> notification to the userspace. This is obviously not right because
> >> these trip are handled by the kernel.
> >>
> >> This patch introduces the 'user' trip point type where its semantic is
> >> simple: do nothing at the kernel level, just send a notification to
> >> the user space.
> >
> > Sounds like OS behavior/policy though I guess the existing ones kind are
> > too. Maybe we should have defined *what* action to take and then the OS
> > could decide whether what actions to handle vs. pass it up a level.
>
> Right
>
> > Why can't userspace just ask to be notified at a trip point it
> > defines?
>
> Yes I think it is possible to create a netlink message to create a trip
> point which will return a trip id.
>
> Rafael what do you think ?

Trips cannot be created on the fly ATM.

What can be done is to create trips that are invalid to start with and
then set their temperature via sysfs. This has been done already for
quite a while AFAICS.

> > If we keep this in DT, perhaps 'notice' would be a better name that
> > doesn't encode the OS architecture details.
>
> [ ... ]
>
> > BTW, can we decide what to do about 'trips' node being required or not?
> > That's nearly the only DT warning left for some platforms.
>
> A thermal zone is a combination of a sensor, a mitigation logic (user or
> kernel), hardware limits with trip points to activate the logic. Without
> trip points, this logic can not operate, consequently the thermal zone
> description is incomplete.

Well, there is a concept of a tripless thermal zone which simply
represents a sensor.

> I guess those thermal zones are set to have the sensor exported in
> /sys/class/thermal, so the userspace can access the temperature.

I think so.

> However, existing thermal zone description should have at least a 'hot'
> trip point and a 'critical' trip point.
>
> On the other hand, now that we are introducing the 'user' trip point,
> those thermal zone can exist without trip points because we can create
> them at any time from userspace.

No, they cannot be created at any time.

> So at the first glance, I would say we can drop the "required"
> constraint for the trip points in the thermal zone description.

That's correct, but for other reasons.