Re: [PATCH 08/11] irqchip/gic: Configure SGIs as standard interrupts

From: Marc Zyngier
Date: Wed Apr 21 2021 - 11:49:23 EST


On Wed, 21 Apr 2021 15:52:52 +0100,
dann frazier <dann.frazier@xxxxxxxxxxxxx> wrote:
>
> [ + Fu Wei ]

[...]

> >
> > Please feed this stacktrace to scripts/decode_stacktrace.sh so that I
> > can get an idea about what is going wrong. I bet something is playing
> > ungodly games with the one of the IPIs, and things go horribly wrong.
>
> hey Marc,
> Sure:
>
> [ 7.927289] Unable to handle kernel read from unreadable memory at virtual address 0000000000000028
> [ 7.936326] Mem abort info:
> [ 7.939108] ESR = 0x96000004
> [ 7.942151] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 7.947451] SET = 0, FnV = 0
> [ 7.950494] EA = 0, S1PTW = 0
> [ 7.953624] Data abort info:
> [ 7.956492] ISV = 0, ISS = 0x00000004
> [ 7.960316] CM = 0, WnR = 0
> [ 7.963273] [0000000000000028] user address but active_mm is swapper
> [ 7.969616] Internal error: Oops: 96000004 [#1] SMP
> [ 7.974483] Modules linked in:
> [ 7.977531] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc8 #19
> [ 7.983874] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS F02 08/06/2019
> [ 7.990737] pstate: 40400085 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
> [ 7.996732] pc : __ipi_send_mask (/home/ubuntu/linux/./include/linux/irqdomain.h:537 /home/ubuntu/linux/kernel/irq/ipi.c:283)
> [ 8.000910] lr : smp_cross_call (/home/ubuntu/linux/arch/arm64/kernel/smp.c:958)
> [ 8.004913] sp : ffff800012753c10
> [ 8.008216] x29: ffff800012753c10 x28: ffff000100de5d00
> [ 8.013521] x27: 000000000000000a x26: ffff80001225da20
> [ 8.018825] x25: 0000000000000000 x24: ffff000ff62719b0
> [ 8.024129] x23: ffff80001225d000 x22: ffff800012368108
> [ 8.029433] x21: ffff800010f69a20 x20: 0000000000000000
> [ 8.034737] x19: ffff000100143c60 x18: 0000000000000020
> [ 8.040041] x17: 000000008e74252f x16: 00000000bf0ab2ad
> [ 8.045345] x15: ffffffffffffffff x14: 0000000000000000
> [ 8.050649] x13: 003d090000000000 x12: 00003d0900000000
> [ 8.055953] x11: 0000000000000000 x10: 00003d0900000000
> [ 8.061257] x9 : ffff800010027f14 x8 : 0000000000000000
> [ 8.066561] x7 : 00000000ffffffff x6 : ffff000ff6148698
> [ 8.071865] x5 : ffff80001159d040 x4 : ffff80001159d110
> [ 8.077169] x3 : ffff800010f69a00 x2 : 0000000000000000
> [ 8.082473] x1 : ffff800010f69a20 x0 : 0000000000000000
> [ 8.087777] Call trace:
> [ 8.090213] __ipi_send_mask (/home/ubuntu/linux/./include/linux/irqdomain.h:537 /home/ubuntu/linux/kernel/irq/ipi.c:283)

Thanks for that. This resolves to:

if (irq_domain_is_ipi_per_cpu(data->domain)) {

data->domain is NULL, and we probably are using freed memory...

> > Now, here's a hunch: in the fine TX1 tradition, the firmware is broken
> > and the GTDT table looks unusable. Amusingly, the crash happens right
> > after the SBSA watchdog fails to probe.
>
> Yeah, I noticed that, but didn't highlight it as I didn't see it in
> the backtrace...
>
> > And looking at the code that implements that driver, it looks dodgy as
> > hell, as it unmaps an interrupt it doesn't even know is valid. And it
> > does that right when the driver fails the way you experienced it. If,
> > by any chance, the interrupt field is 0 in the firmware table, this
> > results in SGI0 being unmapped. Given that this is the rescheduling
> > interrupt, fireworks happen.
>
> ... and that explains why. I wouldn't have gotten there, but wish I'd
> thought to test w/ the watchdog compiled out :(

No worries. This IRQ series has uncovered a number of terrible driver
behaviours since I merged it, and these bugs are worth every penny.

> > Can you have a go with the patchlet below, and let me know if that
> > helps?
>
> It does!

Awesome. I'll Cc you on the actual patch, feel free to respond with a
Tested-by: if you want.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.