Re: [REGRESSION] ACPI: threaded SCI handler causes soft lockup in kdump capture kernel
From: Rafael J. Wysocki
Date: Tue Jun 02 2026 - 13:40:09 EST
On Tue, Jun 2, 2026 at 7:20 PM Mario Limonciello
<mario.limonciello@xxxxxxx> wrote:
> On 6/2/26 12:17, Rafael J. Wysocki wrote:
> > On Monday, June 1, 2026 10:29:08 PM CEST Mohan Yelugoti wrote:
> >> Hi,
> >>
> >> We ship SONiC[0] on top of Debian stable with a small set of patches
> >> on top (see [1]). After moving SONiC from Debian bookworm
> >> to trixie, we started seeing a regression where the kdump capture
> >> kernel hits a soft lockup while collecting the crash dump, so no
> >> vmcore is produced.
> >>
> >> We trigger the original panic in the kernel with:
> >>
> >> echo c > /proc/sysrq-trigger
> >>
> >> The capture kernel then boots and hangs. To capture a usable trace
> >> rather than waiting indefinitely, I added the following to the
> >> capture-kernel command line so that it panics on soft lockup (and
> >> dumps backtraces from all CPUs) when the hang is detected:
> >>
> >> debug=1 loglevel=7 softlockup_all_cpu_backtrace=1 softlockup_panic=1
> >>
> >> With those in place, the capture kernel prints:
> >>
> >> watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [irq/9-acpi:39]
> >> CPU: 0 UID: 0 PID: 39 Comm: irq/9-acpi Not tainted
> >> 6.12.41+deb13-sonic-amd64 #1 Debian 6.12.41-1
> >> RIP: 0010:acpi_os_read_port+0x30/0xa0
> >
> > I have no idea what possibly may lock up in that function, sorry.
> >
> > In this particular case it is reading a byte from the I/O space via inb(),
> > that's all.
> >
> > I'm wondering if it gets interrupted for some reason and then something
> > odd happens that prevents it from continuing.
> >
> >> Call Trace:
> >> <TASK>
> >> acpi_hw_gpe_read+0x61/0x80
> >> acpi_ev_detect_gpe+0x74/0x180
> >> acpi_ev_gpe_detect+0xe1/0x130
> >> acpi_ev_sci_xrupt_handler+0x1d/0x40
> >> acpi_irq+0x1c/0x40
> >> irq_thread_fn+0x23/0x60
> >> irq_thread+0x1b3/0x2f0
> >> kthread+0xd2/0x100
> >> ret_from_fork+0x34/0x50
> >> ret_from_fork_asm+0x1a/0x30
> >> </TASK>
> >> Kernel panic - not syncing: softlockup: hung tasks
> >>
> >> Between the bookworm and trixie kernels, the ACPI SCI handler was
> >> moved from a hardirq handler to a threaded handler by:
> >>
> >> 7a36b901a6eb ("ACPI: OSL: Use a threaded interrupt handler for SCI")
> >
> > Maybe it exposed some latent issue.
> >
> > It's too late to revert it from the mainline anyway.
> >
> >> To confirm this is the trigger, I reverted that commit on top of
> >> 6.12.41 and re-ran the same sysrq-trigger reproducer. The capture
> >> kernel then completes the crash dump without tripping the soft
> >> lockup watchdog.
> >>
> >> Happy to test patches or gather more data as this is reproducible
> >> consistently on our systems.
> >
> > Does it always occur at the same spot? If so, you could try to
> > instrument the code around it and look for clues.
>
> Especially if it's just a latent exposed issue before spending too much
> time intrumenting it's probably worth cross referencing 7.1-rc6 as well
> to see if the same issue can occur.
Good idea.
> >
> >> This issue has been raised in the SONiC project at [2].
> >>
> >> Context
> >> -------
> >> - Kernel: 6.12.41+deb13-sonic-amd64 (Debian 6.12.41-1);
> >> - Last known good: Debian bookworm kernel (6.1.123), predates
> >> 7a36b901a6eb.
> >>
> >> - Default capture-kernel cmdline (from /etc/default/kdump-tools,
> >> KDUMP_CMDLINE_APPEND):
> >>
> >> irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service
> >> ata_piix.prefer_ms_hyperv=0 panic=10 debug loglevel=4
> >> hpet=disable pcie_port=compat pci=nommconf
Also, I would try to remove irqpoll from this command line and retest.
> >> For the trace above, loglevel was raised from 4 to 7 and the
> >> following were added:
> >>
> >> debug=1 softlockup_all_cpu_backtrace=1 softlockup_panic=1
> >>
> >> [0]: https://github.com/sonic-net
> >> [1]: https://github.com/sonic-net/sonic-linux-kernel/tree/master/patches-sonic
> >> [2]: https://github.com/sonic-net/sonic-linux-kernel/pull/580