Re: [PATCH v2 2/2] thermal: core: Add sanity check for polling_delay and passive_delay
From: Rafael J. Wysocki
Date: Mon Jul 08 2024 - 10:03:55 EST
On Mon, Jul 8, 2024 at 3:58 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
>
> On 08/07/2024 15:38, Rafael J. Wysocki wrote:
> > On Mon, Jul 8, 2024 at 2:12 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote:
> >>
> >> On 05/07/2024 21:46, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >>>
> >>> If polling_delay is nonzero and passive_delay is 0, the thermal zone
> >>> will use polling except when tz->passive is nonzero, which does not make
> >>> sense.
> >>>
> >>> Also if polling_delay is nonzero and passive_delay is greater than
> >>> polling_delay, the thermal zone temperature will be updated less often
> >>> when tz->passive is nonzero. This does not make sense either.
> >>>
> >>> Ensure that none of the above will happen.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >>> ---
> >>>
> >>> v1 -> v2: The patch actually matches the changelog
> >>>
> >>> ---
> >>> drivers/thermal/thermal_core.c | 3 +++
> >>> 1 file changed, 3 insertions(+)
> >>>
> >>> Index: linux-pm/drivers/thermal/thermal_core.c
> >>> ===================================================================
> >>> --- linux-pm.orig/drivers/thermal/thermal_core.c
> >>> +++ linux-pm/drivers/thermal/thermal_core.c
> >>> @@ -1440,6 +1440,9 @@ thermal_zone_device_register_with_trips(
> >>> td->threshold = INT_MAX;
> >>> }
> >>>
> >>> + if (polling_delay && (passive_delay > polling_delay || !passive_delay))
> >>> + passive_delay = polling_delay;
> >>
> >> Given this is a system misconfiguration, it would make more sense to
> >> bail out with -EINVAL. Assigning a default value in the back of the
> >> caller will never raise its attention and can make a bad configuration
> >> staying for a long time.
> >
> > This works except for the case mentioned below.
> >
> > I think that passive_delay > polling_delay can trigger a -EINVAL, but
> > (polling_delay && !passive_delay) cannot do it because it is regarded
> > as a valid case as per the below.
>
> Right I can see ATM only this as an illogic combination:
>
> polling_delay && passive_delay &&
> (polling_delay < passive_delay)
>
> >> That said, there are configurations with a passive delay set to zero but
> >> with a non zero polling delay. For instance, a thermal zone mitigated
> >> with a fan, so active trip points are set. Another example is when there
> >> is only critical trip points for a thermal zone.
> >>
> >> Actually there are multiple combinations with delays value which may
> >> look invalid but which are actually valid.
> >>
> >> For example, a setup with polling_delay > 0, passive_delay = 0, active
> >> trip points, cooling map to this active trips, passive trip points
> >> without cooling map.
> >>
> >> IMHO, it is better to do the configuration the system is asking for,
> >> even if it sounds weird
> >
> > Except that it doesn't work as expected because if passive_delay = 0,
> > polling is paused when tz->passive is set.
>
> Yes, but as there is no cooling map, there is no governor action, thus
> tz->passive is never set.
In current linux-next, it is set when a passive trip is crossed on the way up.
> So we can have a passive polling equal to zero
> without being illegal as no passive mitigation will happen.
>
> The passive delay is really there only if there is a passive cooling
> device mapped to a passive trip point.
Well, shouldn't user space get notified more often when passive
cooling is under way?
> The polling delay is in charge of mitigating the active cooling device
> like a fan. So it is possible to mix an active trip point to mitigate
> with a fan and then put at a higher temperature a passive trip point
> with a higher sampling resolution.
But it is not correct to pause polling when tz->passive is set.