Re: [REGRESSION] ACPI: threaded SCI handler causes soft lockup in kdump capture kernel

From: Rafael J. Wysocki

Date: Tue Jun 02 2026 - 13:25:04 EST


On Monday, June 1, 2026 10:29:08 PM CEST Mohan Yelugoti wrote:
> Hi,
>
> We ship SONiC[0] on top of Debian stable with a small set of patches
> on top (see [1]). After moving SONiC from Debian bookworm
> to trixie, we started seeing a regression where the kdump capture
> kernel hits a soft lockup while collecting the crash dump, so no
> vmcore is produced.
>
> We trigger the original panic in the kernel with:
>
> echo c > /proc/sysrq-trigger
>
> The capture kernel then boots and hangs. To capture a usable trace
> rather than waiting indefinitely, I added the following to the
> capture-kernel command line so that it panics on soft lockup (and
> dumps backtraces from all CPUs) when the hang is detected:
>
> debug=1 loglevel=7 softlockup_all_cpu_backtrace=1 softlockup_panic=1
>
> With those in place, the capture kernel prints:
>
> watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [irq/9-acpi:39]
> CPU: 0 UID: 0 PID: 39 Comm: irq/9-acpi Not tainted
> 6.12.41+deb13-sonic-amd64 #1 Debian 6.12.41-1
> RIP: 0010:acpi_os_read_port+0x30/0xa0

I have no idea what possibly may lock up in that function, sorry.

In this particular case it is reading a byte from the I/O space via inb(),
that's all.

I'm wondering if it gets interrupted for some reason and then something
odd happens that prevents it from continuing.

> Call Trace:
> <TASK>
> acpi_hw_gpe_read+0x61/0x80
> acpi_ev_detect_gpe+0x74/0x180
> acpi_ev_gpe_detect+0xe1/0x130
> acpi_ev_sci_xrupt_handler+0x1d/0x40
> acpi_irq+0x1c/0x40
> irq_thread_fn+0x23/0x60
> irq_thread+0x1b3/0x2f0
> kthread+0xd2/0x100
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1a/0x30
> </TASK>
> Kernel panic - not syncing: softlockup: hung tasks
>
> Between the bookworm and trixie kernels, the ACPI SCI handler was
> moved from a hardirq handler to a threaded handler by:
>
> 7a36b901a6eb ("ACPI: OSL: Use a threaded interrupt handler for SCI")

Maybe it exposed some latent issue.

It's too late to revert it from the mainline anyway.

> To confirm this is the trigger, I reverted that commit on top of
> 6.12.41 and re-ran the same sysrq-trigger reproducer. The capture
> kernel then completes the crash dump without tripping the soft
> lockup watchdog.
>
> Happy to test patches or gather more data as this is reproducible
> consistently on our systems.

Does it always occur at the same spot? If so, you could try to
instrument the code around it and look for clues.

> This issue has been raised in the SONiC project at [2].
>
> Context
> -------
> - Kernel: 6.12.41+deb13-sonic-amd64 (Debian 6.12.41-1);
> - Last known good: Debian bookworm kernel (6.1.123), predates
> 7a36b901a6eb.
>
> - Default capture-kernel cmdline (from /etc/default/kdump-tools,
> KDUMP_CMDLINE_APPEND):
>
> irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service
> ata_piix.prefer_ms_hyperv=0 panic=10 debug loglevel=4
> hpet=disable pcie_port=compat pci=nommconf
>
> For the trace above, loglevel was raised from 4 to 7 and the
> following were added:
>
> debug=1 softlockup_all_cpu_backtrace=1 softlockup_panic=1
>
> [0]: https://github.com/sonic-net
> [1]: https://github.com/sonic-net/sonic-linux-kernel/tree/master/patches-sonic
> [2]: https://github.com/sonic-net/sonic-linux-kernel/pull/580
>
>