Re: [PATCH v1] thermal: trip: Avoid skipping trips in thermal_zone_set_trips()

From: Rafael J. Wysocki
Date: Tue Jul 30 2024 - 16:17:53 EST


On Tue, Jul 30, 2024 at 6:46 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Tue, Jul 30, 2024 at 4:57 PM Daniel Lezcano
> <daniel.lezcano@xxxxxxxxxx> wrote:
> >
> > On 30/07/2024 16:41, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > >
> > > Say there are 3 trip points A, B, C sorted in ascending temperature
> > > order with no hysteresis. If the zone temerature is exactly equal to
> > > B, thermal_zone_set_trips() will set the boundaries to A and C and the
> > > hardware will not catch any crossing of B (either way) until either A
> > > or C is crossed and the boundaries are changed.
> >
> > When the temperature is B, an interrupt fired which led to the
> > thermal_zone_update() and in turn it calls thermal_zone_set_trips()
> >
> > As the current temperature is equal to trip B, it makes sense to set A
> > and C, as B has been processes when handling the thermal trips right
> > before calling thermal_zone_set_trips()
>
> So say that A, B and C are active trips and the thermal zone uses the
> Bang-bang governor. Say that each trip point has a fan associated
> with it, so that every time it is crossed on the way up, the fan
> should be turned on, and every time it is crossed on the way down, the
> fan should be turned off. Denote these fans as f_A, f_B, and f_C,
> respectively.
>
> Say the initial thermal zone temperature is below A, so the initial
> thermal_zone_set_trips() interval is {-INT_MAX, A}. The zone
> temperature starts to rise and A is reached, so an interrupt triggers.
> __thermal_zone_device_update() runs and it sees that the zone
> temperature is equal to A, so thermal_zone_set_trips() sets the new
> interval to {-INT_MAX, B} and f_A is turned on.
>
> Say the zone temperature is still rising, despite f_A being on, and B
> is reached. __thermal_zone_device_update() runs and it sees that the
> zone temperature is equal to B, so thermal_zone_set_trips() sets the
> new interval to {A, C} and f_B is turned on.
>
> Say the temperature rises somewhat above B, but does not reach C and
> starts to fall down. B is crossed on the way down, so f_B should be
> turned off, but nothing happens, because an interrupt will only
> trigger when A is reached.

Worse yet, if the zone temperature starts to fall down between A and
B, after setting the thermal_zone_set_trips() interval to {-INT_MAX,
B},,nothing will happen when A is crossed on the way down, and since
the low boundary is clearly of the "don't care" type, f_A will not be
turned off until B is reached AFAICS, which may be never.

> > I'm failing to understand the issue you describe
>
> I hope the above helps.
>
> > Were you able to reproduce the issue with emul_temp ?
>
> I haven't tried, but I'm sure I'd be able to reproduce it.