Re: [PATCH v4 01/30] thermal/core: Add a generic thermal_zone_get_trip() function

From: Marek Szyprowski
Date: Fri Sep 23 2022 - 18:19:46 EST


Hi Daniel,

On 21.09.2022 11:42, Daniel Lezcano wrote:
> The thermal_zone_device_ops structure defines a set of ops family,
> get_trip_temp(), get_trip_hyst(), get_trip_type(). Each of them is
> returning a property of a trip point.
>
> The result is the code is calling the ops everywhere to get a trip
> point which is supposed to be defined in the backend driver. It is a
> non-sense as a thermal trip can be generic and used by the backend
> driver to declare its trip points.
>
> Part of the thermal framework has been changed and all the OF thermal
> drivers are using the same definition for the trip point and use a
> thermal zone registration variant to pass those trip points which are
> part of the thermal zone device structure.
>
> Consequently, we can use a generic function to get the trip points
> when they are stored in the thermal zone device structure.
>
> This approach can be generalized to all the drivers and we can get rid
> of the ops->get_trip_*. That will result to a much more simpler code
> and make possible to rework how the thermal trip are handled in the
> thermal core framework as discussed previously.
>
> This change adds a function thermal_zone_get_trip() where we get the
> thermal trip point structure which contains all the properties (type,
> temp, hyst) instead of doing multiple calls to ops->get_trip_*.
>
> That opens the door for trip point extension with more attributes. For
> instance, replacing the trip points disabled bitmask with a 'disabled'
> field in the structure.
>
> Here we replace all the calls to ops->get_trip_* in the thermal core
> code with a call to the thermal_zone_get_trip() function.
>
> While at it, add the thermal_zone_get_num_trips() to encapsulate the
> code more and reduce the grip with the thermal framework internals.
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>

This patch landed in linux next-20220923 as commit 78ffa3e58d93
("thermal/core: Add a generic thermal_zone_get_trip() function").
Unfortunately it introduces a deadlock:

thermal_zone_device_update() calls handle_thermal_trip() under the
tz->lock, which in turn calls thermal_zone_get_trip(), which gathers
again tz->lock. I've tried to fix this by switching
handle_thermal_trip() to call __thermal_zone_get_trip().

This helps for fixing the issue in this change, but then I've tried to
apply it on top of linux next-20220923. Unfortunately it fails again. It
looks that the other changes also assumes that calling
thermal_zone_get_trip() is possible under the tz->lock, because in my
case it turned out that handle_non_critical_trips() called
step_wise_throttle(), which in turn called thermal_zone_get_trip(). I
gave up fixing this. Please re-check possible call paths and adjust
locking to them.


> ---
> drivers/thermal/thermal_core.c | 87 +++++++++++++++++++++++--------
> drivers/thermal/thermal_helpers.c | 28 +++++-----
> drivers/thermal/thermal_netlink.c | 21 ++++----
> drivers/thermal/thermal_sysfs.c | 66 +++++++++--------------
> include/linux/thermal.h | 5 ++
> 5 files changed, 118 insertions(+), 89 deletions(-)
>
> ...

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland