Re: [PATCH v14 01/18] irqchip/sifive-plic: Convert PLIC driver into a platform driver
From: Samuel Holland
Date: Wed May 29 2024 - 18:04:16 EST
Hi Geert,
On 2024-05-29 9:22 AM, Geert Uytterhoeven wrote:
> Hi Anup,
>
> On Thu, Feb 22, 2024 at 10:41 AM Anup Patel <apatel@xxxxxxxxxxxxxxxx> wrote:
>> The PLIC driver does not require very early initialization so convert
>> it into a platform driver.
>>
>> After conversion, the PLIC driver is probed after CPUs are brought-up
>> so setup cpuhp state after context handler of all online CPUs are
>> initialized otherwise PLIC driver crashes for platforms with multiple
>> PLIC instances.
>>
>> Signed-off-by: Anup Patel <apatel@xxxxxxxxxxxxxxxx>
>
> Thanks for your patch, which is now commit 8ec99b033147ef3b
> ("irqchip/sifive-plic: Convert PLIC driver into a platform
> driver") in v6.9.
>
> It looks like this conversion is causing issues on BeagleV Starlight
> Beta. After updating esmil/visionfive to v6.10-rc1, the kernel usually
> fails to boot. Adding "earlycon keep_bootcon" reveals these differences:
>
> -riscv-plic c000000.interrupt-controller: mapped 133 interrupts with 2
> handlers for 4 contexts.
> +------------[ cut here ]------------
> +WARNING: CPU: 0 PID: 1 at drivers/irqchip/irq-sifive-plic.c:373
> plic_handle_irq+0xf2/0xf6
> +Modules linked in:
> +CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 6.10.0-rc1-starlight-02342-g0ba4c76ca0e8-dirty #323
> +Hardware name: BeagleV Starlight Beta (DT)
> +epc : plic_handle_irq+0xf2/0xf6
> + ra : generic_handle_domain_irq+0x1c/0x2a
> +epc : ffffffff8033f994 ra : ffffffff8006319a sp : ffffffc800003f50
> + gp : ffffffff812d63f0 tp : ffffffd8800b8000 t0 : 0000000000000040
> + t1 : 0000000000000000 t2 : 0000000000001000 s0 : ffffffc800003fa0
> + s1 : 0000000000000009 a0 : ffffffd880183600 a1 : 0000000000000009
> + a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> + a5 : 0000000000000000 a6 : ffffffd880400248 a7 : ffffffd8804002b8
> + s2 : ffffffd9f8fac458 s3 : 0000000000000004 s4 : 0000000000000000
> + s5 : ffffffff81293f58 s6 : ffffffd88014ac00 s7 : 0000000000000004
> + s8 : ffffffc800013b2c s9 : ffffffc800013b34 s10: 0000000000000006
> + s11: ffffffd9f8fc1458 t3 : 0000000000000002 t4 : 0000000000000402
> + t5 : ffffffd8800610c0 t6 : ffffffd8800610e0
> +status: 0000000200000100 badaddr: ffffffd9f8fac458 cause: 0000000000000003
> +[<ffffffff8033f994>] plic_handle_irq+0xf2/0xf6
> +[<ffffffff8006319a>] generic_handle_domain_irq+0x1c/0x2a
> +[<ffffffff8033d7aa>] riscv_intc_irq+0x26/0x60
> +[<ffffffff806c92ee>] handle_riscv_irq+0x4a/0x74
> +[<ffffffff806d2346>] call_on_irq_stack+0x32/0x40
> +---[ end trace 0000000000000000 ]---
> +Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000004
> +Oops [#1]
> +Modules linked in:
> +CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W
> 6.10.0-rc1-starlight-02342-g0ba4c76ca0e8-dirty #323
> +Hardware name: BeagleV Starlight Beta (DT)
> +epc : plic_handle_irq+0x66/0xf6
> + ra : generic_handle_domain_irq+0x1c/0x2a
> +epc : ffffffff8033f908 ra : ffffffff8006319a sp : ffffffc800003f50
> + gp : ffffffff812d63f0 tp : ffffffd8800b8000 t0 : 0000000000000040
> + t1 : 0000000000000000 t2 : 0000000000001000 s0 : ffffffc800003fa0
> + s1 : 0000000000000009 a0 : ffffffd880183600 a1 : 0000000000000009
> + a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> + a5 : ffffffff8033d72a a6 : ffffffd880400248 a7 : ffffffd8804002b8
> + s2 : ffffffd9f8fac458 s3 : 0000000000000004 s4 : ffffffd880183630
> + s5 : ffffffff81293f58 s6 : ffffffff812948a0 s7 : ffffffff80c4e660
> + s8 : ffffffff80d9eea0 s9 : ffffffc800013b34 s10: 0000000000000006
> + s11: ffffffd9f8fc1458 t3 : 0000000000000002 t4 : 0000000000000402
> + t5 : ffffffd8800610c0 t6 : ffffffd8800610e0
> +status: 0000000200000100 badaddr: 0000000000000004 cause: 000000000000000d
> +[<ffffffff8033f908>] plic_handle_irq+0x66/0xf6
> +[<ffffffff8006319a>] generic_handle_domain_irq+0x1c/0x2a
> +[<ffffffff8033d7aa>] riscv_intc_irq+0x26/0x60
> +[<ffffffff806c92ee>] handle_riscv_irq+0x4a/0x74
> +[<ffffffff806d2346>] call_on_irq_stack+0x32/0x40
> +Code: 8b93 d70b 5b17 00f5 0b13 fa8b fc17 00a5 0c13 5a0c (a783) 0009
> +---[ end trace 0000000000000000 ]---
> +Kernel panic - not syncing: Fatal exception in interrupt
> +SMP: stopping secondary CPUs
> +---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
> As "mapped 133 interrupts" is no longer printed, it looks like an
> unexpected early interrupt comes in while still in plic_probe().
>
> Esmil suggested reverting all of:
> a7fb69ffd7ce438a irqchip/sifive-plic: Avoid explicit cpumask allocation on stack
> abb7205794900503 irqchip/sifive-plic: Improve locking safety by using
> irqsave/irqrestore
> 95652106478030f5 irqchip/sifive-plic: Parse number of interrupts and
> contexts early in plic_probe()
> a15587277a246c38 irqchip/sifive-plic: Cleanup PLIC contexts upon
> irqdomain creation failure
> 6c725f33d67b53f2 irqchip/sifive-plic: Use riscv_get_intc_hwnode() to
> get parent fwnode
> b68d0ff529a939a1 irqchip/sifive-plic: Use devm_xyz() for managed allocation
> 25d862e183d4efeb irqchip/sifive-plic: Use dev_xyz() in-place of pr_xyz()
> 8ec99b033147ef3b irqchip/sifive-plic: Convert PLIC driver into a platform driver
>
> After this, the PLIC is initialized earlier again, and this indeed
> seems to fix the issue for me.
> Before, the kernel booted fine in only ca. 1 out of 5 tries.
> After the reverts, it booted 5/5.
>
> Do you know what's going on? Is there a simpler fix?
The fact that you hit the warning indicates that plic_handle_irq() was called
before handler->present was set. Previously the PLIC driver was probed very
early, so it is unlikely that some peripheral already had a pending interrupt.
Now, while platform device drivers would not yet be able to request interrupts
(because the irqdomain is not registered yet), they could have programmed the
hardware in a way that generates an interrupt. If that interrupt was enabled at
the PLIC (e.g. by the bootloader), then we could expect plic_handle_irq() to be
called as soon as irq_set_chained_handler() is called.
So the fix is to not call irq_set_chained_handler() until after the handlers are
completely set up.
I've sent a patch doing this:
https://lore.kernel.org/linux-riscv/20240529215458.937817-1-samuel.holland@xxxxxxxxxx/
Regards,
Samuel