Re: [Patch v5 2/6] thermal: qcom: Add support for LMh driver

From: Thara Gopinath
Date: Tue Aug 31 2021 - 10:52:36 EST




On 8/23/21 11:57 AM, Daniel Lezcano wrote:

Hi Bjorn,

On 23/08/2021 17:05, Bjorn Andersson wrote:
On Sat 21 Aug 02:41 PDT 2021, Daniel Lezcano wrote:


Hi Thara,

On 09/08/2021 21:16, Thara Gopinath wrote:
Driver enabling various pieces of Limits Management Hardware(LMh) for cpu
cluster0 and cpu cluster1 namely kick starting monitoring of temperature,
current, battery current violations, enabling reliability algorithm and
setting up various temperature limits.

The following has been explained in the cover letter. I am including this
here so that this remains in the commit message as well.

LMh is a hardware infrastructure on some Qualcomm SoCs that can enforce
temperature and current limits as programmed by software for certain IPs
like CPU. On many newer LMh is configured by firmware/TZ and no programming
is needed from the kernel side. But on certain SoCs like sdm845 the
firmware does not do a complete programming of the h/w. On such soc's
kernel software has to explicitly set up the temperature limits and turn on
various monitoring and enforcing algorithms on the hardware.

Tested-by: Steev Klimaszewski <steev@xxxxxxxx> # Lenovo Yoga C630
Signed-off-by: Thara Gopinath <thara.gopinath@xxxxxxxxxx>

Is it possible to have an option to disable/enable the LMh driver at
runtime, for instance with a module parameter ?


Are you referring to being able to disable the hardware throttling, or
the driver's changes to thermal pressure?

The former.

Hi Daniel,

It is not recommended to turn off LMh once enabled. From h/w point of view, it can be done for debug purposes but it is not to be implemented as a feature.



I'm not aware of any way to disable the hardware. I do remember that
there was some experiments done (with a hacked up boot chain) early on
and iirc it was concluded that it's not a good idea.

My objective was to test the board with the thermal framework handling
the mitigation instead of the hardware.

I guess I can set the hardware temperature higher than the thermal zone
temperature.

Right. Also remember that patch 5 in this series removes the cooling devices for the cpu thermal zones. So if you are testing this you will have to add them back.


On which sensor the lmh does refer to ? The cluster one ?

(by the way the thermal zone temperatures per core are lower by 5°C than
the hardware mitigation ? is it done on purpose ?)


So IIUC, it refers to tsens for individual cpus and collates the input. But the documentation is not clear on this one. I took the mitigation temperature from downstream code. Yes I did realize that the thermal zone trip1 temp is 90 degree where as the LMh mitigation point is 95 degree. My thinking is this is because the h/w mitigation can happen faster than s/w and hence the 5 degree bump up in temperature.


Either way, if there is a way and there is a use for it, we can always
add such parameter incrementally. So I suggest that we merge this as is.

Yes, that was for my information. It is already merged.

Thank you very much


Thanks

-- Daniel


--
Warm Regards
Thara (She/Her/Hers)