Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.

From: Ben Greear

Date: Tue Mar 03 2026 - 19:07:09 EST


On 3/3/26 13:54, Tejun Heo wrote:
Hello,

On Tue, Mar 03, 2026 at 01:40:54PM -0800, Ben Greear wrote:
If I use a kthread to do the blocking reg_todo work, then the problem
goes away, so it somehow does appear that the work flush logic down in swap.c
is somehow being blocked by the reg_todo work item, not just the swap.c
logic somehow blocking against itself.

My kthread hack left the reg_todo work item logic in place, but instead of
the work item doing any blocking work, it instead just wakes the kthread
I added and has that kthread do the work under mutex.

The second regulatory related work item in net/wireless/ causes the same
lockup, though it was harder to reproduce. Putting that work in the kthread
also seems to have fixed it.

I could only ever reproduce this with KASAN (and lockdep and other debugging options
enabled), my guess is that this is because then the system runs slower and/or there
is more memory pressure.

I should still be able to reproduce this if I switch to upstream kernel, so
if there is any debugging code you'd like me to execute, I will attempt to
do so.

I think the main thing is findin out what state the work item is in. Is it
pending, running, or finished? You can enable wq tracepoints to figure that
out or if you can take a crashdump when it's stalled, nowadays it's really
easy to tell the state w/ something like claude code and drgn. Just tell
claude to use drgn to look at the crashdump and ask it to locate the work
item and what it's doing. It works surprisingly well.

Could the logic that detects blocked work-queues instead be instrumented
to print out more useful information so that just reproducing the problem
and providing dmesg output will be sufficient? Or does dmesg already provide
enough that would give you a clue as to what is going on?

If I were to attempt to use AI on the coredump, would echoing 'c' to /proc/sysrq-trigger
with kdump enabled (when deadlock is happening) be the appropriate action to grab the core file?

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com