Re: [PATCH v8 00/29] Rework the trip points creation

From: Marek Szyprowski
Date: Thu Oct 06 2022 - 12:26:09 EST

Hi Daniel,

On 06.10.2022 08:55, Daniel Lezcano wrote:
> On 05/10/2022 15:05, Marek Szyprowski wrote:
>> On 05.10.2022 14:37, Daniel Lezcano wrote:
>>> On 03/10/2022 23:18, Daniel Lezcano wrote:
>>> [ ... ]
>>>>> I've tested this v8 patchset after fixing the issue with Exynos TMU
>>>>> with
>>>>> patch and I got the following lockdep warning on all Exynos-based
>>>>> boards:
>>>>> ======================================================
>>>>> WARNING: possible circular locking dependency detected
>>>>> 6.0.0-rc1-00083-ge5c9d117223e #12945 Not tainted
>>>>> ------------------------------------------------------
>>>>> swapper/0/1 is trying to acquire lock:
>>>>> c1ce66b0 (&data->lock#2){+.+.}-{3:3}, at: exynos_get_temp+0x3c/0xc8
>>>>> but task is already holding lock:
>>>>> c2979b94 (&tz->lock){+.+.}-{3:3}, at:
>>>>> thermal_zone_device_update.part.0+0x3c/0x528
>>>>> which lock already depends on the new lock.
>>>> I'm wondering if the problem is not already there and related to
>>>> data->lock ...
>>>> Doesn't the thermal zone lock already prevent racy access to the data
>>>> structure?
>>>> Another question: if the sensor clock is disabled after reading it,
>>>> how does the hardware update the temperature and detect the programed
>>>> threshold is crossed?
>>> just a gentle ping, as the fix will depend on your answer ;)
>> Sorry, I've been busy with other stuff. I thought I will fix this once I
>> find a bit of spare time.
> Ok, that is great if you can find time to fix it up because I've other
> drivers to convert to the generic thermal trips.
>> IMHO the clock management is a bit over-engineered, as there is little
>> (if any) benefit from such fine grade clock management. That clock is
>> needed only for the AHB related part of the TMU (reading/writing the
>> registers). The IRQ generation and temperature measurement is clocked
>> from so called 'sclk' (special clock).
>> I also briefly looked at the code and the internal lock doesn't look to
>> be really necessary assuming that the thermal core already serializes
>> all the calls.
> I looked at the code and I think the driver can be simplified (fixed?)
> even more.
> IIUC, the sensor has multiple trip point interrupts, so if the device
> tree is describing more trip points than the sensor supports, there is
> a warning and the number of trip point is capped.
> IMO that can be simplified by using two trip point interrupt because
> the thermal_zone_device_update() will call the set_trips callback with
> the new boundaries. IOW, the thermal framework sets a new trip point
> interrupt when one is crossed.
> That should result in the simplification of the tmu_control as well as
> the tmu_probe function. As well as removing the limitation of the
> number of trip points.
> In order to have that correctly working, the 'set_trips' ops must be
> used to call the tmu_control callback instead of calling it in tmu_probe.
> The intialization workflow should be:
> probe->...
>  ->thermal_zone_device_register()
>   ->thermal_zone_device_update()
>    ->update_trip_points()
>     ->ops->set_trips()
>       ->tmu_control()
> Also, replace the workqueue by a threaded interrupt.
> Does it make sense?

Yes, definitely. Frankly speaking I've never looked into that code, so I
was not aware that it needs some cleanup.

Best regards
Marek Szyprowski, PhD
Samsung R&D Institute Poland