Re: [PATCH 2/2] thermal: qcom: Add support for Qualcomm MBG thermal monitoring
From: Sachin Gupta
Date: Tue Jun 23 2026 - 06:19:17 EST
On 6/19/2026 5:44 PM, Konrad Dybcio wrote:
On 6/19/26 8:45 AM, Sachin Gupta wrote:
On 6/16/2026 3:40 PM, Konrad Dybcio wrote:
On 6/1/26 1:01 PM, Sachin Gupta wrote:
From: Satya Priya Kakitapalli <quic_skakitap@xxxxxxxxxxx>
Add driver for the Qualcomm MBG thermal monitoring device. It monitors
the die temperature, and when there is a level 1 upper threshold
violation, it receives an interrupt over spmi. The driver reads
the fault status register and notifies thermal accordingly.
Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@xxxxxxxxxxx>
Co-developed-by: Sachin Gupta <sachin.gupta@xxxxxxxxxxxxxxxx>
Signed-off-by: Sachin Gupta <sachin.gupta@xxxxxxxxxxxxxxxx>
---
[...]
+ /*
+ * Configure the last_temp one degree higher, to ensure the
+ * violated temp is returned to thermal framework when it reads
+ * temperature for the first time after the violation happens.
+ * This is needed to account for the inaccuracy in the conversion
+ * formula used which leads to the thermal framework setting back
+ * the same thresholds in case the temperature it reads does not
+ * show violation.
+ */
+ chip->last_temp = temp + MBG_TEMP_CONSTANT;
Will this work fine if the user tries to set the max temp supported
by the hardware (i.e. is there headroom for max+1)?
In the current implementation, temp == MBG_MAX_SUPPORTED_TEMP is not accepted (temp < MBG_MAX_SUPPORTED_TEMP), so the last_temp = temp + MBG_TEMP_CONSTANT path is never taken at absolute max. For accepted trips (strictly below max), there is headroom for the +1C adjustment.
You check for `temp < MBG_MAX_SUPPORTED_TEMP` and there's:
#define MBG_MAX_SUPPORTED_TEMP 160000,
so passing temp=159999 is "valid" and after the addition it becomes 160999,
which in my understanding is outside the range
Konrad
chip->last_temp is only a software cache used in one place, mbg_tm_get_temp(), to return a synthetic “trip violated” reading once after the IRQ. It is not programmed into any hardware register. So temp + MBG_TEMP_CONSTANT exceeding MBG_MAX_SUPPORTED_TEMP does not cause a hardware out-of-range condition.
Do you see this as an issue?
Thanks,
Sachin