Re: [PATCH] arm64: kdump: fix interrupt handling done during machine_crash_shutdown
From: Mark Rutland
Date: Fri Mar 02 2018 - 11:57:55 EST
On Fri, Mar 02, 2018 at 04:44:13PM +0000, Mark Rutland wrote:
> On Fri, Mar 02, 2018 at 02:52:07PM +0100, Grzegorz Jaszczyk wrote:
> > 2018-03-02 14:15 GMT+01:00 Mark Rutland <mark.rutland@xxxxxxx>:
> > > Do you see this for a panic() in *any* interrupt handler?
> >
> > I only test with this two interrupt handlers: watchdog and i2c but I
> > think it will behave the same with others - I can try with other if
> > you want, any suggestion which? Maybe with some PPI interrupt instead?
> > >
> > > Can you trigger the issue with magic-sysrq c, for example?
> >
> > There is no problem when I trigger it via 'echo c >
> > /proc/sysrq-trigger' - it works well all the time. The problem appears
> > only, when the kexec/kdump procedure is triggered from interrupt
> > context
>
> I'd meant that you'd send sysrq + c over serial, rather than writing to
> /proc/sysrq-trigger. That way, the panic will be in the context of the
> UART IRQ handler.
>
> If that shows the issue, that's ilikely to be the easiest way for
> someone else to reproduce and investigate this.
FWIW, having just given this a go on my Juno R1 with v4.16-rc3
defconfig, the UART IRQs work fine in the crash kernel. That crash
happened in IRQ context:
[ 384.653153] Call trace:
[ 384.655581] sysrq_handle_crash+0x20/0x30
[ 384.659559] __handle_sysrq+0xa8/0x1a0
[ 384.663278] handle_sysrq+0x28/0x38
[ 384.666738] pl011_fifo_to_tty+0x150/0x1a8
[ 384.670801] pl011_int+0x30c/0x430
[ 384.674177] __handle_irq_event_percpu+0x5c/0x148
[ 384.678843] handle_irq_event_percpu+0x34/0x88
[ 384.683250] handle_irq_event+0x48/0x78
[ 384.687056] handle_fasteoi_irq+0xa8/0x180
[ 384.691119] generic_handle_irq+0x24/0x38
[ 384.695095] __handle_domain_irq+0x5c/0xb0
[ 384.699158] gic_handle_irq+0x58/0xa8
[ 384.702790] el1_irq+0xb0/0x128
[ 384.705907] cpuidle_enter_state+0x138/0x220
[ 384.710142] cpuidle_enter+0x18/0x20
[ 384.713690] call_cpuidle+0x1c/0x38
[ 384.717151] do_idle+0x1b0/0x1e8
[ 384.720354] cpu_startup_entry+0x20/0x28
[ 384.724246] rest_init+0xd0/0xe0
[ 384.727450] start_kernel+0x3e4/0x410
On a separate note, the crashkernel complained:
[ 0.224730] CPU: CPUs started in inconsistent modes
... which is a separate disaster. I suspect the kexec code failed to punt the
crash CPU back to EL2 as it should have.
Thanks,
Mark.