Re: [PATCH v2] irqchip/bcm2835: Quiesce IRQs left enabled by bootloader

From: Lukas Wunner
Date: Sun Feb 23 2020 - 13:24:54 EST


On Sun, Feb 23, 2020 at 06:59:56PM +0100, Stefan Wahren wrote:
> thanks for all the investigation. Unfortunately the patch below doesn't
> compile, since it lacks the definiton of REG_FIQ_ENABLE.

Ugh, I recall fixing that when compile-testing. I must have forgotten
to invoke "git commit --amend" before "git format-patch".

> Btw the name is a little bit unlucky because it defines a single flag
> within REG_FIQ_CONTROL instead of a separate register.

The Foundation's repo uses that name so I stuck by it to reduce the
number of merge conflicts Phil will have to resolve. Happy to change
though, suggestions welcome.

Thanks!

Lukas

> >
> > -- >8 --
> > From: Lukas Wunner <lukas@xxxxxxxxx>
> > Subject: [PATCH] irqchip/bcm2835: Quiesce IRQs left enabled by bootloader
> >
> > Per the spec, the BCM2835's IRQs are all disabled when coming out of
> > power-on reset. Its IRQ driver assumes that's still the case when the
> > kernel boots and does not perform any initialization of the registers.
> > However the Raspberry Pi Foundation's bootloader leaves the USB
> > interrupt enabled when handing over control to the kernel.
> >
> > Quiesce IRQs and the FIQ if they were left enabled and log a message to
> > let users know that they should update the bootloader once a fixed
> > version is released.
> >
> > If the USB interrupt is not quiesced and the USB driver later on claims
> > the FIQ (as it does on the Raspberry Pi Foundation's downstream kernel),
> > interrupt latency for all other peripherals increases and occasional
> > lockups occur. That's because both the FIQ and the normal USB interrupt
> > fire simultaneously.
> >
> > On a multicore Raspberry Pi, if normal interrupts are routed to CPU 0
> > and the FIQ to CPU 1 (hardcoded in the Foundation's kernel), then a USB
> > interrupt causes CPU 0 to spin in bcm2836_chained_handle_irq() until the
> > FIQ on CPU 1 has cleared it. Other peripherals' interrupts are starved
> > as long. I've seen CPU 0 blocked for up to 2.9 msec. eMMC throughput
> > on a Compute Module 3 irregularly dips to 23.0 MB/s without this commit
> > but remains relatively constant at 23.5 MB/s with this commit.
> >
> > The lockups occur when CPU 0 receives a USB interrupt while holding a
> > lock which CPU 1 is trying to acquire while the FIQ is temporarily
> > disabled on CPU 1. At best users get RCU CPU stall warnings, but most
> > of the time the system just freezes.
> >
> > Fixes: 89214f009c1d ("ARM: bcm2835: add interrupt controller driver")
> > Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
> > Reviewed-by: Florian Fainelli <f.fainelli@xxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx # v3.7+
> > Cc: Serge Schneider <serge@xxxxxxxxxxxxxxx>
> > Cc: Kristina Brooks <notstina@xxxxxxxxx>
> > ---
> > drivers/irqchip/irq-bcm2835.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/drivers/irqchip/irq-bcm2835.c b/drivers/irqchip/irq-bcm2835.c
> > index 418245d..eca9ac7 100644
> > --- a/drivers/irqchip/irq-bcm2835.c
> > +++ b/drivers/irqchip/irq-bcm2835.c
> > @@ -135,6 +135,7 @@ static int __init armctrl_of_init(struct device_node *node,
> > {
> > void __iomem *base;
> > int irq, b, i;
> > + u32 reg;
> >
> > base = of_iomap(node, 0);
> > if (!base)
> > @@ -157,6 +158,19 @@ static int __init armctrl_of_init(struct device_node *node,
> > handle_level_irq);
> > irq_set_probe(irq);
> > }
> > +
> > + reg = readl_relaxed(intc.enable[b]);
> > + if (reg) {
> > + writel_relaxed(reg, intc.disable[b]);
> > + pr_err(FW_BUG "Bootloader left irq enabled: "
> > + "bank %d irq %*pbl\n", b, IRQS_PER_BANK, &reg);
> > + }
> > + }
> > +
> > + reg = readl_relaxed(base + REG_FIQ_CONTROL);
> > + if (reg & REG_FIQ_ENABLE) {
> > + writel_relaxed(0, base + REG_FIQ_CONTROL);
> > + pr_err(FW_BUG "Bootloader left fiq enabled\n");
> > }
> >
> > if (is_2836) {