Re: [PATCH V10 4/4] thermal: qcom: add support for PMIC5 Gen3 ADC thermal monitoring

From: Daniel Lezcano

Date: Mon Apr 27 2026 - 04:22:34 EST

Hi Jishnu,

On 4/25/26 12:37, Jishnu Prakash wrote:

Hi Daniel,

[ ... ]

Actually, if get_sdam_from_irq() or adc5_gen3_read() fail, they will return without clearing the interrupt flag, so we should potentially end up in an infinite loop.

I went over this discussion and the code again and I need to explain a few things.

First, please note that individual PMIC drivers do not handle interrupt clearing by
themselves - they are under the SPMI PMIC arbiter (drivers/spmi/spmi-pmic-arb.c),
which is an interrupt controller and the arbiter driver handles IRQ clearing internally
for all the PMIC drivers under it. So we won't be clearing the interrupt explicitly here.

So the status should be cleared at the end with IRQ_HANDLED. IRQ_NONE returned if it is for another subsystem.

Maybe my explanation above wasn't sufficiently detailed - the "status" I mentioned
above is not related to the interrupt by itself, it is just the ADC's status register
reading. If it indicates an ADC conversion fault, the fault has to be cleared by
calling the function adc5_gen3_status_clear() before we can make subsequent conversion
requests (like for .set_trips) on the ADC.

There are 3 main reasons for which the Gen3 ADC interrupt may fire:

1. End of conversion for immediate conversion request (handled in main ADC driver)
2. Threshold violation for TM conversions (handled in this auxiliary driver)
3. Conversion fault (handled in both drivers)

I think it would make the most sense if I make changes so that the ISRs
(of both main and aux driver) return IRQ_HANDLED only when any of the above
3 cases are detected and they return IRQ_NONE in case of errors or when the
interrupt is confirmed to be for the other driver.

Does this sound fine?

Thanks for clarifying. It should be fine.

If you think there can be a significant number of errors in the handler may be you should add statistics but later in an additional series if it makes sense.

[ ... ]

[ ... ]

+    dev_dbg(adc_tm5->dev, "channel:%s, low_temp(mdegC):%d, high_temp(mdegC):%d\n",
+        prop->common_props.label, low_temp, high_temp);
+
+    guard(adc5_gen3)(adc_tm5);
+    if (high_temp == INT_MAX && low_temp == -INT_MAX)
+        return adc_tm5_gen3_disable_channel(prop);

Why disable the channel instead of returning an errno ?

This is the convention we follow in our existing ADC_TM driver at
drivers/thermal/qcom/qcom-spmi-adc-tm5.c. If both upper and lower
thresholds are meant to be disabled, we disable the channel fully
in HW to save some power and it can be enabled later if this API
is called for it with valid thresholds.

Is it considered invalid in the thermal framework to try to disable
both thresholds? Should I both disable the channel and return some
error from here?

Well, if the channel is disabled, then the temperature sensor of the thermal zone is disabled, consequently the thermal zone is disabled from a HW POV but enabled from the kernel POV.

Why not add the 'change_mode' ops and then disable the thermal zone (+ pm_runtime) ?

I have a doubt about this part - if the thermal framework sends threshold values
of (-INT_MAX, INT_MAX) in the .set_trips callback, doesn't it mean that the
framework is already trying to disable the thermal sensor?

No, let me clarify this:

* the thermal thresholds are set by the userspace to get notification when a temperature is crossing the way up or/and down a specified limit. Thresholds can be deleted, added, flushed, etc ...

* the thermal trips are specified by the firmware and should not be changed even if I agree there are the writable trip points which IMO we can consider for debug purpose

If you specify a trip/threshold point with min=-INT_MAX and max=INT_MAX, semantically speaking it is correct but in practice these temperatures won't be reached so it is like the trip point is disabled but it is a side effect.

A trip point is a property of a thermal zone.

A thermal sensor is represented by a thermal zone.

A thermal zone can be enabled without trip points or thresholds. The userspace is still able to read a temperature through netlink or sysfs.

Or does it just mean threshold monitoring functionality can be disabled for now,
but other functionality like temperature reading is still expected to work?

Please note that adc_tm5_gen3_disable_channel() only disables the threshold
monitoring and interrupt generation functionality - the get_temp() call to read
temperature will still work.

Above you stated we receive a notification when the conversion is finished. So how do you read the temperature if there is no interrupt telling to read the converted value ?

Perhaps I should not have used the wording
"disable the channel" above, as calling adc_tm5_gen3_disable_channel() is not
exactly the same as fully disabling the thermal zone.

Please let me know if any change is needed there - should I return any error
after calling adc_tm5_gen3_disable_channel() ?

I understand, your point. Perhaps just put it apart and address it later as an optimization when the driver is merged and everything is clarified.

[ ... ]

+    /*
+     * Skipping first SDAM IRQ as it is requested in parent driver.
+     * If there is a TM violation on that IRQ, the parent driver calls
+     * the notifier (adctm_event_handler) exposed from this driver to handle it.
+     */
+    for (i = 1; i < adc_tm5->dev_data->num_sdams; i++) {
+        ret = devm_request_threaded_irq(dev,
+                        adc_tm5->dev_data->base[i].irq,
+                        NULL, adctm5_gen3_isr, IRQF_ONESHOT,
+                        adc_tm5->dev_data->base[i].irq_name,
+                        adc_tm5);

The threaded interrupts set the isr in a thread and from the thread
handling the event, there is a work queue scheduled. Why not use the
top and bottom halves of the threaded interrupt ? Hopefully you should
be able to remove the lock.

Yes, I can use the top and bottom halves of the threaded interrupt as you
suggested. But what exactly do you mean by removing the lock?

If you meant the mutex lock used in this driver, we cannot remove that.
This is because the ADC_TM driver needs to write into several registers
shared with the main ADC driver for setting new thresholds, so we
have to share a mutex between the drivers to prevent concurrency issues.

When using a workqueue tampering with registers while an interrupt handler is doing the same, the lock is needed.

But if the workqueue is replaced by threaded interrupt, the lock *may* not be needed because the design may prevent race conditions.

That may be not true in this case, I did not investigate deeper in the code to figure it out. Let's see the next version

I think a lock will not be needed with the change I planned, but you
can judge it in the next version.

Sure, thanks

-- Daniel