Re: [PATCH 2/2] thermal: qcom: Add support for Qualcomm MBG thermal monitoring

From: Sachin Gupta

Date: Tue Jun 23 2026 - 06:19:17 EST




On 6/19/2026 5:44 PM, Konrad Dybcio wrote:
On 6/19/26 8:45 AM, Sachin Gupta wrote:


On 6/16/2026 3:40 PM, Konrad Dybcio wrote:
On 6/1/26 1:01 PM, Sachin Gupta wrote:
From: Satya Priya Kakitapalli <quic_skakitap@xxxxxxxxxxx>

Add driver for the Qualcomm MBG thermal monitoring device. It monitors
the die temperature, and when there is a level 1 upper threshold
violation, it receives an interrupt over spmi. The driver reads
the fault status register and notifies thermal accordingly.

Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@xxxxxxxxxxx>
Co-developed-by: Sachin Gupta <sachin.gupta@xxxxxxxxxxxxxxxx>
Signed-off-by: Sachin Gupta <sachin.gupta@xxxxxxxxxxxxxxxx>
---

[...]

+    /*
+     * Configure the last_temp one degree higher, to ensure the
+     * violated temp is returned to thermal framework when it reads
+     * temperature for the first time after the violation happens.
+     * This is needed to account for the inaccuracy in the conversion
+     * formula used which leads to the thermal framework setting back
+     * the same thresholds in case the temperature it reads does not
+     * show violation.
+     */
+    chip->last_temp = temp + MBG_TEMP_CONSTANT;

Will this work fine if the user tries to set the max temp supported
by the hardware (i.e. is there headroom for max+1)?


In the current implementation, temp == MBG_MAX_SUPPORTED_TEMP is not accepted (temp < MBG_MAX_SUPPORTED_TEMP), so the last_temp = temp + MBG_TEMP_CONSTANT path is never taken at absolute max. For accepted trips (strictly below max), there is headroom for the +1C adjustment.

You check for `temp < MBG_MAX_SUPPORTED_TEMP` and there's:

#define MBG_MAX_SUPPORTED_TEMP 160000,

so passing temp=159999 is "valid" and after the addition it becomes 160999,
which in my understanding is outside the range

Konrad

chip->last_temp is only a software cache used in one place, mbg_tm_get_temp(), to return a synthetic “trip violated” reading once after the IRQ. It is not programmed into any hardware register. So temp + MBG_TEMP_CONSTANT exceeding MBG_MAX_SUPPORTED_TEMP does not cause a hardware out-of-range condition.

Do you see this as an issue?

Thanks,
Sachin