Re: iio: ad7192: Pending IRQ on SDO/RDY causing retrigger of interrupt handler and missing every sample

From: Fabrizio Lamarque
Date: Fri Apr 07 2023 - 04:52:15 EST


On Wed, Apr 5, 2023 at 4:35 PM Lars-Peter Clausen <lars@xxxxxxxxxx> wrote:
>
> On 4/5/23 06:49, Nuno Sá wrote:
> > [...]
> > I just tested the patch, but, at least on the platform I'm working on
> > (I.MX6), it does not
> > solve the issue.
> > Whereas the thread describes the very same issue I am experiencing, I
> > am not sure
> > IRQ_DISABLE_UNLAZY would have any impact. By reading CPU registers I see
> > I was expecting to have. AFAIU, the lazy approach would be the responsible for
> > this behavior because when you call disable_irq() it does not really mask the
> > IRQ in HW. Just marks it as disabled and if another IRQ event comes set's it to
> > PENDING and only then masks at HW level... If we "unlazy" the IRQ, we should
> > mask the IRQ right away so I would expect not to have any pending IRQ...
> >
> >> the IRQ line disabled at the hardware level, but when the IRQ flag of
> >> the processor
> >> is set (and this happens even if the interrupt is masked in HW), the
> >> interrupt is immediately
> >> triggered anyway.
> >> (I see GPIOx_IMR cleared, so interrupt is masked, but GPIOx_ISR set. As soon
> >> as
> >> enable_irq() is called the interrupt is triggered. Just by clearing
> >> GPIOx_ISR before
> >> enable_irq() solves the issue. I might share a few debug printk).
>
> This sounds to me that the GPIO driver should implement an
> `irq_enable()` callback where it clears the pending bits before
> unmasking. This is consistent with what other GPIO IRQ drivers do.
>

Hello Lars,
thank you for your comments and for having a look at this.

I am not really confident in dissecting how ARM interrupt management
and NXP specific features. However, he "Delayed Interrupt" feature in
kernel documentation (specific to ARM interrupt implementation),
"prevents losing edge interrupts on hardware which does not store an
edge interrupt event while the interrupt is disabled at the hardware
level". This feature is always enabled for any ARM core.
Indeed, NXP hardware _stores_ the GPIO edge interrupt when the IRQ is
disabled. While not explicitly written in kernel docs, to my limited
understanding enable_irq() should not clear the pending edge IRQ
(eventually occurred when interrupt was disabled).
I do not see anything that clearly does not conform to documentation
in how IRQs are being managed.

> the driver was written with the assumption that
> irq_disable() will stop recording of IRQ events. If that does not hold
> we might have to switch from irq_enable/irq_disable to
> request_irq/free_irq, which will come with a slight overhead.

request_irq/free_irq should clear the interrupt flags.
However, since this change might have side effects on other ADI ADC
drivers and trigger functionality, I'd be unable to provide a valid
and verified patch (but I could test a proposed patch on my platform).

In order to give you a better understanding of the issue, I am
providing a simple log here.

Here are the GPIO registers involved (only relevant bits are masked):

ICR = Edge sensitivity (3 = FALLING)
ISR = Interrupt status register (1 = DETECTED) independent of IMR
IMR = Interrupt mask register, 1 = ENABLED, 0 = MASKED
STATUS = IRQ status/config register, IRQ_DISABLE_UNLAZY (20th bit) is set
STATE = internal interrupt state, IRQS_PENDING (10th bit) is never set
COMPLETION = done flag of completion

Here is what happens within ad_sigma_delta_single_conversion() function:

# cat in_voltage1_raw <-- First conversion
[ 1001.540803] ad_sigma_delta_single_conversion() entry
[ 1001.540837] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1001.545849] ad_sigma_delta_set_channel() before ad_sigma_delta_set_channel()
[ 1001.569456] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1001.569700] ad_sigma_delta_set_channel() before ad_sigma_delta_set_mode()
[ 1001.586455] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1001.595279] ad_sigma_delta_set_channel() before enable_irq()
[ 1001.604869] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1001.612409] ad_sigma_delta_set_channel() before
wait_for_completion_interruptible_timeout()
[ 1001.621859] ICR = 3, IMR = 1, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1001.997637] ad_sd_data_rdy_trig_poll() entry
[ 1002.005523] ad_sd_data_rdy_trig_poll() before disable_irq_nosync()
[ 1002.009805] ad_sd_data_rdy_trig_poll() before iio_trigger_poll()
[ 1002.015994] ad_sd_data_rdy_trig_poll() exit
[ 1002.026725] ad_sigma_delta_set_channel() before ad_sd_read_reg()
[ 1002.031136] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1002.037382] ad_sigma_delta_set_channel() before ad_sigma_delta_set_mode()
[ 1002.048902] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1002.057685] ad_sigma_delta_set_channel() exit
[ 1002.067249] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
8304626

# cat in_voltage1_raw <-- Any following conversion
[ 1003.968784] ad_sigma_delta_single_conversion() entry
[ 1003.971193] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1003.976224] ad_sigma_delta_set_channel() before ad_sigma_delta_set_channel()
[ 1003.989727] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1003.998723] ad_sigma_delta_set_channel() before ad_sigma_delta_set_mode()
[ 1004.008524] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1004.017512] ad_sigma_delta_set_channel() before enable_irq()
[ 1004.027031] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1004.034409] ad_sd_data_rdy_trig_poll() entry
[ 1004.042280] ad_sd_data_rdy_trig_poll() before disable_irq_nosync()
[ 1004.046560] ad_sd_data_rdy_trig_poll() before iio_trigger_poll()
[ 1004.052747] ad_sd_data_rdy_trig_poll() exit
[ 1004.063603] ad_sigma_delta_set_channel() before
wait_for_completion_interruptible_timeout()
[ 1004.067981] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 1
[ 1004.079159] ad_sigma_delta_set_channel() before ad_sd_read_reg()
[ 1004.088709] ICR = 3, IMR = 0, ISR = 0, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1004.096793] ad_sigma_delta_set_channel() before ad_sigma_delta_set_mode()
[ 1004.106339] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
[ 1004.115121] ad_sigma_delta_set_channel() exit
[ 1004.124737] ICR = 3, IMR = 0, ISR = 1, STATUS = 80402h, STATE =
4000h, COMPLETION = 0
8304626

The first time there is a delay on wait_for_completion_interruptible_timeout().
In the second throw the interrupt function is called as soon as
enable_irq() is executed, completion is already set before entering
the wait state.
You may see the ISR register gets set again after ad_sd_read_reg(),
and this is the only relevant difference between the two sequences.

Unless you have any other suggestions, I will send three simpler
patches relevant to this driver, and leave to you the changes required
to overcome the issue described here.
In any case, I am available to test it on my side.


Fabrizio