Re: Maybe a bug in kernel/irq/chip.c unmask_irq(), device IRQ masked unexpectedly. (re-formated the mail body, sorry)

From: Yi Zheng
Date: Mon Oct 14 2019 - 10:53:42 EST


Hi, Thomas

I canceled that patch. In my testing, that will not fix the problem.
Why the IRQ is unexpectedly masked, that is not an easy bug for me.
More time need to hacking the driver/kernel code.

Thanks for your reply.

Thomas Gleixner <tglx@xxxxxxxxxxxxx> ä2019å10æ14æåä äå8:34åéï
>
> On Tue, 8 Oct 2019, Yi Zheng wrote:
> > There is some defects on IRQ processing:
> >
> > (1) At the beginning of handle_level_irq(), the IRQ-28 is masked, and ACK
> > action is executed: On my machine, it runs the 'else' branch:
> >
> > static inline void mask_ack_irq(struct irq_desc *desc)
> > {
> > if (desc->irq_data.chip->irq_mask_ack) {
> > desc->irq_data.chip->irq_mask_ack(&desc->irq_data);
> > irq_state_set_masked(desc);
> > } else {
> > mask_irq(desc);
> > if (desc->irq_data.chip->irq_ack)
> > desc->irq_data.chip->irq_ack(&desc->irq_data);
> > }
> > }
> >
> > It is an 2-steps procedure:
> > 1. mask_irq()
> > 2. desc->irq_data.chip->irq_ack()
> >
> > the 2nd step, the function ptr is omap_mask_ack_irq(), which
> > _MASK_ the hardware INTC-IRQ-32 and then do the real ACK action.
>
> Sure. Where is the problem?
>
> > (2) mask_irq()/unmask_irq() are not atomic actions: They check the
> > IRQD_IRQ_MASKED flag firstly, and then mask/unmask the irq by calling
> > the function ptrs which installed by irq controller drv. Then, those 2
> > functions set/clear the IRQD_IRQ_MASKED flag.
> >
> > I think the sequence of the hw/sw action should be mirrored reversed:
> > mask_irq():
> > check IRQD_IRQ_MASKED;
> > set hardware IRQ mask register;
> > set software IRQD_IRQ_MASKED flag;
> >
> > unmask_irq():
> > check IRQD_IRQ_MASKED;
> > /* NOTE: should before the hw unmask action!! */
> > clear software IRQD_IRQ_MASKED flag;
> > clear hardware IRQ mask register;
> >
> > The current unmask_irq(), hw-mask action runs before sw-mask action,
> > which gives an very small time window. That cause an unexpected
> > iterated IRQ.
>
> It's completely irrelevant because _ALL_ those operations run with
> irq_desc->lock held. So nothing can actually observe that state.
>
> > Here is my the detail of my analyzing of handle_level_irq():
> >
> > (1) Let record the HW-IRQ-Controller Status and the SW-Flag IRQD_IRQ_MASKED
> > pair as following: (hw-mask, sw-mask).
> >
> > (2) In the 1st level of IRQ-28 ISR calling, in unmask_irq(), after the HW
> > unmask action, and before the sw-flag IRQD_IRQ_MASKED is cleared, there
> > is a VERY SMALL TIME WINDOW, in which, another IRQ-28 may triggered.
> >
> > In that time window, the mask status is (0, 1), which is no an valid
> > value.
>
> Again. Irrelevant because not observable.
>
> > My fixup is in the attachment, which remove the unexpected time window of
> > IRQ iteration.
>
> Please don't send attachments. See Documentation/process/submitting-patches.rst
>
> Thanks,
>
> tglx
>