Re: [PATCH] thermal/core/power_allocator: avoid cdev->state can not be reset

From: Lukasz Luba
Date: Mon Mar 13 2023 - 07:18:31 EST




On 3/13/23 11:10, Xuewen Yan wrote:
Hi Lukasz

On Mon, Mar 13, 2023 at 5:35 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:

Hi Xuewen,

On 3/13/23 01:40, Xuewen Yan wrote:
Hi Lukasz

On Fri, Mar 10, 2023 at 11:56 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:

Hi Di,

On 3/9/23 13:55, Di Shen wrote:
Commit 0952177f2a1f (thermal/core/power_allocator: Update once cooling devices when temp is low)
add a update flag to update cooling device only once when temp is low.
But when the switch_on_temp is set to be a higher value, the cooling device state
may not be reset to max, because the last_temp is smaller than the switch_on_temp.

For example:
First:
swicth_on_temp=70 control_temp=85;

Then userspace change the trip_temp:
swicth_on_temp=45 control_temp=55 cur_temp=54

Then userspace reset the trip_temp:
swicth_on_temp=70 control_temp=85 cur_temp=57 last_temp=54

At this time, the cooling device state should be reset to be max.
However, because cur_temp(57) < switch_on_temp(70)
last_temp(54) < swicth_on_temp(70) --> update = false
When update is false, the cooling device state can not be reset.

That's a tricky use case. How is that now possible,

We use the trip_temp in the Android System. Often, we set different
control temperatures in different scenarios,
and when we change the switch_on_temp from small to bigger, we find
the power can not be reset to be max.

I see, thanks for letting me know that this is Android.




So delete the update condition, so that the cooling device state
could be reset.

IMO this is not the desired solution. Daniel reported the issue that
IPA triggers the event sent to user-space even when there is no need.
That's the motivation for the 0952177f2a1f change.

To address your scenario properly, we need an interface which allows
to respond properly for such situation when someone from user-space
writes a new value to something fundamental as trip point.

You also have a kernel config enabled:
CONFIG_THERMAL_WRITABLE_TRIPS
which IMO is only for debug kernels for system integrator (according
to the Kconfig description).

Yes, we use it to meet the temperature control needs of different scenarios.
And now in android with google's GKI2.0, the config must be opened.

OK



When you disable this config in your deploy/product kernel
than this issue would disappear.


Fixes: 0952177f2a1f (thermal/core/power_allocator: Update once cooling devices when temp is low)
Signed-off-by: Di Shen <di.shen@xxxxxxxxxx>
---
drivers/thermal/gov_power_allocator.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)


That's why IMO this is not the solution.

Yes, but I think we should fix the bug, although the
CONFIG_THERMAL_WRITABLE_TRIPS is just for debugging.
How about record the last_trip_temp, and when the last_temp >
last_trip_temp, set the update tobe true?

Yes, if that config is used in Android then we must fix it.

That last_trip_temp makes sense (but maybe name it last_switch_on_temp).
Please put that new field into the IPA local
struct power_allocator_params. We should store the trip temp
value there every time power_allocator_throttle() is called.
That can be called due to a write from user-space w/ a new trip point
value, so should be OK.

Thanks for the suggestion. We would send the patch-v2 as soon as possible.

Thanks!
I'll review that and check on my board.
BTW, which board/device you use with this IPA? Maybe I can buy it
and also test to capture such regression in future.