Re: [REGRESSION] ACPI: threaded SCI handler causes soft lockup in kdump capture kernel
From: Mario Limonciello
Date: Tue Jun 02 2026 - 13:26:38 EST
On 6/2/26 12:17, Rafael J. Wysocki wrote:
On Monday, June 1, 2026 10:29:08 PM CEST Mohan Yelugoti wrote:
Hi,
We ship SONiC[0] on top of Debian stable with a small set of patches
on top (see [1]). After moving SONiC from Debian bookworm
to trixie, we started seeing a regression where the kdump capture
kernel hits a soft lockup while collecting the crash dump, so no
vmcore is produced.
We trigger the original panic in the kernel with:
echo c > /proc/sysrq-trigger
The capture kernel then boots and hangs. To capture a usable trace
rather than waiting indefinitely, I added the following to the
capture-kernel command line so that it panics on soft lockup (and
dumps backtraces from all CPUs) when the hang is detected:
debug=1 loglevel=7 softlockup_all_cpu_backtrace=1 softlockup_panic=1
With those in place, the capture kernel prints:
watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [irq/9-acpi:39]
CPU: 0 UID: 0 PID: 39 Comm: irq/9-acpi Not tainted
6.12.41+deb13-sonic-amd64 #1 Debian 6.12.41-1
RIP: 0010:acpi_os_read_port+0x30/0xa0
I have no idea what possibly may lock up in that function, sorry.
In this particular case it is reading a byte from the I/O space via inb(),
that's all.
I'm wondering if it gets interrupted for some reason and then something
odd happens that prevents it from continuing.
Call Trace:
<TASK>
acpi_hw_gpe_read+0x61/0x80
acpi_ev_detect_gpe+0x74/0x180
acpi_ev_gpe_detect+0xe1/0x130
acpi_ev_sci_xrupt_handler+0x1d/0x40
acpi_irq+0x1c/0x40
irq_thread_fn+0x23/0x60
irq_thread+0x1b3/0x2f0
kthread+0xd2/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Kernel panic - not syncing: softlockup: hung tasks
Between the bookworm and trixie kernels, the ACPI SCI handler was
moved from a hardirq handler to a threaded handler by:
7a36b901a6eb ("ACPI: OSL: Use a threaded interrupt handler for SCI")
Maybe it exposed some latent issue.
It's too late to revert it from the mainline anyway.
To confirm this is the trigger, I reverted that commit on top of
6.12.41 and re-ran the same sysrq-trigger reproducer. The capture
kernel then completes the crash dump without tripping the soft
lockup watchdog.
Happy to test patches or gather more data as this is reproducible
consistently on our systems.
Does it always occur at the same spot? If so, you could try to
instrument the code around it and look for clues.
Especially if it's just a latent exposed issue before spending too much time intrumenting it's probably worth cross referencing 7.1-rc6 as well to see if the same issue can occur.
This issue has been raised in the SONiC project at [2].
Context
-------
- Kernel: 6.12.41+deb13-sonic-amd64 (Debian 6.12.41-1);
- Last known good: Debian bookworm kernel (6.1.123), predates
7a36b901a6eb.
- Default capture-kernel cmdline (from /etc/default/kdump-tools,
KDUMP_CMDLINE_APPEND):
irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service
ata_piix.prefer_ms_hyperv=0 panic=10 debug loglevel=4
hpet=disable pcie_port=compat pci=nommconf
For the trace above, loglevel was raised from 4 to 7 and the
following were added:
debug=1 softlockup_all_cpu_backtrace=1 softlockup_panic=1
[0]: https://github.com/sonic-net
[1]: https://github.com/sonic-net/sonic-linux-kernel/tree/master/patches-sonic
[2]: https://github.com/sonic-net/sonic-linux-kernel/pull/580