Re: [PATCH 08/11] irqchip/gic: Configure SGIs as standard interrupts
From: Marc Zyngier
Date: Wed Apr 21 2021 - 06:58:46 EST
Hi Dan,n
On Tue, 20 Apr 2021 22:25:51 +0100,
dann frazier <dann.frazier@xxxxxxxxxxxxx> wrote:
>
> On Tue, Apr 20, 2021 at 02:37:10PM -0600, dann frazier wrote:
> > On Tue, May 19, 2020 at 05:17:52PM +0100, Marc Zyngier wrote:
> > > Change the way we deal with GIC SGIs by turning them into proper
> > > IRQs, and calling into the arch code to register the interrupt range
> > > instead of a callback.
> > >
> > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>
> >
> > hey Marc,
> >
> > I bisected a boot failure on our Gigabyte R120-T33 systems (ThunderX
> > CN88XX) down to this commit, but only when running in ACPI mode. See below:
> >
> >
> > EFI stub: Booting Linux Kernel...
> > EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
> > EFI stub: Using DTB from configuration table
> > EFI stub: Exiting boot services and installing virtual address map...
> > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0a11]
> > [ 0.000000] Linux version 5.11.0-13-generic (buildd@bos02-arm64-067) (gcc (Ubuntu 10.2.1-23ubuntu1) 10.2.1 20210312, GNU ld (GNU Binutils for Ubuntu) 2.36.1) #14-Ubuntu SMP Fri Mar 19 16:57:35 UTC 2021 (Ubuntu 5.11.0-13.14-generic 5.11.7)
>
> Sorry, realized I posted a log from an Ubuntu kernel. Here's an
> upstream one:
[...]
>
> [ 7.842174] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
> [ 7.849699] io scheduler mq-deadline registered
> [ 7.857591] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
> [ 7.865127] efifb: probing for efifb
> [ 7.868738] efifb: No BGRT, not showing boot graphics
> [ 7.873783] efifb: framebuffer at 0x881010000000, using 3072k, total 3072k
> [ 7.880649] efifb: mode is 1024x768x32, linelength=4096, pages=1
> [ 7.886647] efifb: scrolling: redraw
> [ 7.890212] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> [ 7.895905] fbcon: Deferring console take-over
> [ 7.900350] fb0: EFI VGA frame buffer device
> [ 7.905289] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
> [ 7.913714] ACPI: button: Power Button [PWRB]
> [ 7.919549] ACPI GTDT: [Firmware Bug]: failed to get the Watchdog base address.
> [ 7.927289] Unable to handle kernel read from unreadable memory at virtual address 0000000000000028
> [ 7.936326] Mem abort info:
> [ 7.939108] ESR = 0x96000004
> [ 7.942151] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 7.947451] SET = 0, FnV = 0
> [ 7.950494] EA = 0, S1PTW = 0
> [ 7.953624] Data abort info:
> [ 7.956492] ISV = 0, ISS = 0x00000004
> [ 7.960316] CM = 0, WnR = 0
> [ 7.963273] [0000000000000028] user address but active_mm is swapper
> [ 7.969616] Internal error: Oops: 96000004 [#1] SMP
> [ 7.974483] Modules linked in:
> [ 7.977531] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc8 #19
> [ 7.983874] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS F02 08/06/2019
> [ 7.990737] pstate: 40400085 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
> [ 7.996732] pc : __ipi_send_mask+0x60/0x114
> [ 8.000910] lr : smp_cross_call+0x40/0xcc
> [ 8.004913] sp : ffff800012753c10
> [ 8.008216] x29: ffff800012753c10 x28: ffff000100de5d00
> [ 8.013521] x27: 000000000000000a x26: ffff80001225da20
> [ 8.018825] x25: 0000000000000000 x24: ffff000ff62719b0
> [ 8.024129] x23: ffff80001225d000 x22: ffff800012368108
> [ 8.029433] x21: ffff800010f69a20 x20: 0000000000000000
> [ 8.034737] x19: ffff000100143c60 x18: 0000000000000020
> [ 8.040041] x17: 000000008e74252f x16: 00000000bf0ab2ad
> [ 8.045345] x15: ffffffffffffffff x14: 0000000000000000
> [ 8.050649] x13: 003d090000000000 x12: 00003d0900000000
> [ 8.055953] x11: 0000000000000000 x10: 00003d0900000000
> [ 8.061257] x9 : ffff800010027f14 x8 : 0000000000000000
> [ 8.066561] x7 : 00000000ffffffff x6 : ffff000ff6148698
> [ 8.071865] x5 : ffff80001159d040 x4 : ffff80001159d110
> [ 8.077169] x3 : ffff800010f69a00 x2 : 0000000000000000
> [ 8.082473] x1 : ffff800010f69a20 x0 : 0000000000000000
> [ 8.087777] Call trace:
> [ 8.090213] __ipi_send_mask+0x60/0x114
> [ 8.094038] smp_cross_call+0x40/0xcc
> [ 8.097691] smp_send_reschedule+0x3c/0x50
> [ 8.101778] resched_curr+0x5c/0xb0
> [ 8.105258] check_preempt_curr+0x58/0x90
> [ 8.109258] ttwu_do_wakeup+0x2c/0x190
> [ 8.112996] ttwu_do_activate+0x7c/0x114
> [ 8.116909] try_to_wake_up+0x388/0x670
> [ 8.120735] wake_up_process+0x24/0x30
> [ 8.124474] swake_up_one+0x48/0x9c
> [ 8.127953] rcu_gp_kthread_wake+0x68/0x8c
> [ 8.132041] rcu_accelerate_cbs_unlocked+0xb4/0xf0
> [ 8.136822] rcu_core+0x520/0x694
> [ 8.140128] rcu_core_si+0x1c/0x2c
> [ 8.143520] __do_softirq+0x128/0x388
> [ 8.147172] irq_exit+0xc4/0xec
> [ 8.150304] __handle_domain_irq+0x8c/0xec
> [ 8.154394] gic_handle_irq+0xd8/0x2f0
> [ 8.158132] el1_irq+0xc0/0x180
> [ 8.161262] __pi_strcmp+0x20/0x158
> [ 8.164742] driver_register+0x68/0x140
> [ 8.168571] __platform_driver_register+0x34/0x40
> [ 8.173265] imx8mp_clk_driver_init+0x28/0x34
> [ 8.177614] do_one_initcall+0x50/0x260
> [ 8.181440] kernel_init_freeable+0x24c/0x2d4
> [ 8.185790] kernel_init+0x20/0x134
> [ 8.189271] ret_from_fork+0x10/0x18
> [ 8.192840] Code: a90363f7 aa0103f5 d0010957 f9401260 (b9402800)
> [ 8.198955] ---[ end trace c24172add816c1f0 ]---
> [ 8.203562] Kernel panic - not syncing: Oops: Fatal exception in interrupt
> [ 8.210442] SMP: stopping secondary CPUs
> [ 9.258360] SMP: failed to stop secondary CPUs 0,9
> [ 9.263141] Kernel Offset: disabled
> [ 9.266617] CPU features: 0x00040002,69101108
> [ 9.270963] Memory Limit: none
> [ 9.274024] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
Please feed this stacktrace to scripts/decode_stacktrace.sh so that I
can get an idea about what is going wrong. I bet something is playing
ungodly games with the one of the IPIs, and things go horribly wrong.
Now, here's a hunch: in the fine TX1 tradition, the firmware is broken
and the GTDT table looks unusable. Amusingly, the crash happens right
after the SBSA watchdog fails to probe.
And looking at the code that implements that driver, it looks dodgy as
hell, as it unmaps an interrupt it doesn't even know is valid. And it
does that right when the driver fails the way you experienced it. If,
by any chance, the interrupt field is 0 in the firmware table, this
results in SGI0 being unmapped. Given that this is the rescheduling
interrupt, fireworks happen.
Can you have a go with the patchlet below, and let me know if that
helps?
Thanks,
M.
diff --git a/drivers/acpi/arm64/gtdt.c b/drivers/acpi/arm64/gtdt.c
index f2d0e5915dab..0a0a982f9c28 100644
--- a/drivers/acpi/arm64/gtdt.c
+++ b/drivers/acpi/arm64/gtdt.c
@@ -329,7 +329,7 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
int index)
{
struct platform_device *pdev;
- int irq = map_gt_gsi(wd->timer_interrupt, wd->timer_flags);
+ int irq;
/*
* According to SBSA specification the size of refresh and control
@@ -338,7 +338,7 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
struct resource res[] = {
DEFINE_RES_MEM(wd->control_frame_address, SZ_4K),
DEFINE_RES_MEM(wd->refresh_frame_address, SZ_4K),
- DEFINE_RES_IRQ(irq),
+ {},
};
int nr_res = ARRAY_SIZE(res);
@@ -348,10 +348,11 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
if (!(wd->refresh_frame_address && wd->control_frame_address)) {
pr_err(FW_BUG "failed to get the Watchdog base address.\n");
- acpi_unregister_gsi(wd->timer_interrupt);
return -EINVAL;
}
+ irq = map_gt_gsi(wd->timer_interrupt, wd->timer_flags);
+ res[2] = (struct resource)DEFINE_RES_IRQ(irq);
if (irq <= 0) {
pr_warn("failed to map the Watchdog interrupt.\n");
nr_res--;
@@ -364,7 +365,8 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
*/
pdev = platform_device_register_simple("sbsa-gwdt", index, res, nr_res);
if (IS_ERR(pdev)) {
- acpi_unregister_gsi(wd->timer_interrupt);
+ if (irq > 0)
+ acpi_unregister_gsi(wd->timer_interrupt);
return PTR_ERR(pdev);
}
--
Without deviation from the norm, progress is not possible.