Re: [RFC PATCH] irqchip/gic, gic-v3: Ensure data visibility in peripheral

From: Leo Yan
Date: Wed Sep 01 2021 - 03:28:01 EST


Hi Marc,

On Wed, Sep 01, 2021 at 08:03:52AM +0100, Marc Zyngier wrote:
> On Wed, 01 Sep 2021 07:31:15 +0100,
> Leo Yan <leo.yan@xxxxxxxxxx> wrote:
> >
> > When an interrupt line is assered, GIC handles interrupt with the flow
> > (with EOImode == 1):
> >
> > gic_handle_irq()
> > `> do_read_iar() => Change int state to active
> > `> gic_write_eoir() => Drop int priority
> > `> handle_domain_irq()
> > `> generic_handle_irq_desc()
> > `> handle_fasteoi_irq()
> > `> handle_irq_event() => Peripheral handler and
> > de-assert int line
> > `> cond_unmask_eoi_irq()
> > `> chip->irq_eoi()
> > `> gic_eoimode1_eoi_irq() => Change int state to inactive
> >
> > In this flow, it has no explicit memory barrier between the functions
> > handle_irq_event() and chip->irq_eoi(), it's possible that the
> > outstanding data has not reached device in handle_irq_event() but the
> > callback chip->irq_eoi() is invoked, this can lead to state transition
> > for level triggered interrupt:
> >
> > Flow | Interrupt state in GIC
> > ---------------------------------+-------------------------------------
> > Interrupt line is asserted | 'inactive' -> 'pending'
> > do_read_iar() | 'pending' -> 'pending & active'
> > handle_irq_event() | Write peripheral register but it's
> > | not visible for device, so the
> > | interrupt line is still asserted
> > chip->irq_eoi() | 'pending & active' -> 'pending'
> > ...
> > Produce spurious interrupt |
> > with interrupt ID: 1024 |
>
> 1024? Surely not.

Sorry for typo, should be 1023.

>
> > | Finally the peripheral reigster is
> > | updated and the interrupt line is
> > | deasserted: 'pending' -> 'inactive'
> >
> > To avoid this potential issue, this patch adds wmb() barrier prior to
> > invoke EOI operation, this can make sure the interrupt line is
> > de-asserted in peripheral before deactivating interrupt in GIC. At the
> > end, this can avoid spurious interrupt.
>
> If you want to ensure completion of device-specific writes, why isn't
> this the job of the device driver to implement whatever semantic it
> desires?

Seems to me, it's a common requirement for all device drivers to
ensure the outstanding transactions to the endpoint to de-assert the
interrupt line before the GIC driver deactivates the interrupt line.

> What if the interrupt is (shock, horror!) driven by a system
> register instead?

Okay, this is good reason that it's not always to need barrier.

> I think this is merely papering over a driver bug, and adds a
> significant cost to all interrupts for no good reasons.

Understand. The memory barrier can be added per device driver.

Thanks for quick response,
Leo