Re: [PATCH 2/2] thermal: qcom: Add support for Qualcomm MBG thermal monitoring
From: Konrad Dybcio
Date: Tue Jun 23 2026 - 06:42:06 EST
On 6/23/26 12:14 PM, Sachin Gupta wrote:
>
>
> On 6/19/2026 5:44 PM, Konrad Dybcio wrote:
>> On 6/19/26 8:45 AM, Sachin Gupta wrote:
>>>
>>>
>>> On 6/16/2026 3:40 PM, Konrad Dybcio wrote:
>>>> On 6/1/26 1:01 PM, Sachin Gupta wrote:
>>>>> From: Satya Priya Kakitapalli <quic_skakitap@xxxxxxxxxxx>
>>>>>
>>>>> Add driver for the Qualcomm MBG thermal monitoring device. It monitors
>>>>> the die temperature, and when there is a level 1 upper threshold
>>>>> violation, it receives an interrupt over spmi. The driver reads
>>>>> the fault status register and notifies thermal accordingly.
>>>>>
>>>>> Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@xxxxxxxxxxx>
>>>>> Co-developed-by: Sachin Gupta <sachin.gupta@xxxxxxxxxxxxxxxx>
>>>>> Signed-off-by: Sachin Gupta <sachin.gupta@xxxxxxxxxxxxxxxx>
>>>>> ---
>>
>> [...]
>>
>>>>> + /*
>>>>> + * Configure the last_temp one degree higher, to ensure the
>>>>> + * violated temp is returned to thermal framework when it reads
>>>>> + * temperature for the first time after the violation happens.
>>>>> + * This is needed to account for the inaccuracy in the conversion
>>>>> + * formula used which leads to the thermal framework setting back
>>>>> + * the same thresholds in case the temperature it reads does not
>>>>> + * show violation.
>>>>> + */
>>>>> + chip->last_temp = temp + MBG_TEMP_CONSTANT;
>>>>
>>>> Will this work fine if the user tries to set the max temp supported
>>>> by the hardware (i.e. is there headroom for max+1)?
>>>>
>>>
>>> In the current implementation, temp == MBG_MAX_SUPPORTED_TEMP is not accepted (temp < MBG_MAX_SUPPORTED_TEMP), so the last_temp = temp + MBG_TEMP_CONSTANT path is never taken at absolute max. For accepted trips (strictly below max), there is headroom for the +1C adjustment.
>>
>> You check for `temp < MBG_MAX_SUPPORTED_TEMP` and there's:
>>
>> #define MBG_MAX_SUPPORTED_TEMP 160000,
>>
>> so passing temp=159999 is "valid" and after the addition it becomes 160999,
>> which in my understanding is outside the range
>>
>> Konrad
>
> chip->last_temp is only a software cache used in one place, mbg_tm_get_temp(), to return a synthetic “trip violated” reading once after the IRQ. It is not programmed into any hardware register. So temp + MBG_TEMP_CONSTANT exceeding MBG_MAX_SUPPORTED_TEMP does not cause a hardware out-of-range condition.
>
> Do you see this as an issue?
Right, I zoomed in too much
Konrad