Re: MSI irqchip configured as IRQCHIP_ONESHOT_SAFE causes spurious IRQs

From: Ramon Fried
Date: Fri Jan 17 2020 - 08:32:21 EST


Thomas,
Thanks a lot for the detailed answer.

> > Basically, 32 MSI vectors are represented by a single GIC irq.
> > There's a status registers which every bit correspond to an MSI vector, and
> > individual MSI needs to be acked on that registers. in any case where
> > there's asserted bit the GIC IRQ level is high.
>
> Which is not that bad.
>
> >> > It's configured with handle_level_irq() as the GIC is level IRQ.
> >>
> >> Which is completely bonkers. MSI has edge semantics and sharing an
> >> interrupt line for edge type interrupts is broken by design, unless the
> >> hardware which handles the incoming MSIs and forwards them to the level
> >> type interrupt line is designed properly and the driver does the right
> >> thing.
> >
> > Yes, the design of the HW is sort of broken. I concur.
>
> As you describe it, it's not that bad.
>
> >> > The ack callback acks the GIC irq. the mask/unmask calls
> >> > pci_msi_mask_irq() / pci_msi_unmask_irq()
> >>
> >> What? How is that supposed to work with multiple MSIs?
> > Acking is per MSI vector as I described above, so it should work.
>
> No. This is the wrong approach. Lets look at the hardware:
>
> | GIC line X |------| MSI block | <--- Messages from devices
>
> The MSI block latches the incoming message up to the point where it is
> acknowledged in the MSI block. This makes sure that the level semantics
> of the GIC are met.
>
> So from a software perspective you want to do something like this:
>
> gic_irq_handler()
> {
> mask_ack(gic_irqX);
>
> pending = read(msi_status);
> for_each_bit(bit, pending) {
> ack(msi_status, bit); // This clears the latch in the MSI block
> handle_irq(irqof(bit));
> }
> unmask(gic_irqX);
> }
>
> And that works perfectly correct without masking the MSI interrupt at
> the PCI level for a threaded handler simply because the PCI device will
> not send another interrupt until the previous one has been handled by
> the driver unless the PCI device is broken.
I'm missing something here, isn't this implementation blocks IRQ's only during
the HW handler and not during the threaded handler ? (Assuming that I selected
handle_level_irq() as the default handler)

Actually my implementation current implementation is very similar to what
you just described:

static void eq_msi_isr(struct irq_desc *desc)
{
struct irq_chip *chip = irq_desc_get_chip(desc);
struct eq_msi *msi;
u16 status;
unsigned long bitmap;
u32 bit;
u32 virq;

chained_irq_enter(chip, desc);
msi = irq_desc_get_handler_data(desc);

while ((status = readw(msi->gcsr_regs_base + LINK_GCSR5_OFFSET)
& MSI_IRQ_REQ) != 0) {
pr_debug("MSI: %x\n", status >> 12);

bitmap = status >> 12;
for_each_set_bit(bit, &bitmap, msi->num_of_vectors) {
virq = irq_find_mapping(msi->inner_domain, bit);
if (virq) {
generic_handle_irq(virq);
} else {
pr_err("unexpected MSI\n");
handle_bad_irq(desc);
}
}
}
chained_irq_exit(chip, desc);
}

Additionally the domain allocation is defined like:
static int eq_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
unsigned int nr_irqs, void *args)
{
struct eq_msi *msi = domain->host_data;
unsigned long bit;
u32 mask;

/* We only allow 32 MSI per device */
WARN_ON(nr_irqs > 32);
if (nr_irqs > 32)
return -ENOSPC;

bit = find_first_zero_bit(msi->used, msi->num_of_vectors);
if (bit >= msi->num_of_vectors)
return -ENOSPC;

set_bit(bit, msi->used);

mask = readw(msi->gcsr_regs_base + LINK_GCSR6_OFFSET);
mask |= BIT(bit) << 12;
writew(mask, msi->gcsr_regs_base + LINK_GCSR6_OFFSET);

irq_domain_set_info(domain, virq, bit, &eq_msi_bottom_irq_chip,
domain->host_data, handle_level_irq,
NULL, NULL);

pr_debug("Enabling MSI irq: %lu\n", bit);

return 0;
}

>
> If that's the case, then you have to work around that at the device
> driver level, not at the irq chip level, by installing a primary handler
> which quiesces the device (not the MSI part).
>
> Thanks,
>
> tglx