Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.

From: Johannes Berg

Date: Tue Mar 03 2026 - 16:08:48 EST

On Tue, 2026-03-03 at 10:52 -1000, Tejun Heo wrote:
> Hello,
>
> On Tue, Mar 03, 2026 at 12:49:24PM +0100, Johannes Berg wrote:
> > Fair. I don't know, I don't think there's anything that even shows that
> > there's a dependency between the two workqueues and the
> > "((wq_completion)events_unbound)" and "((wq_completion)events)", and
> > there would have to be for it to deadlock this way because of that?
> >
> > But one is mm_percpu_wq and the other is system_percpu_wq.
> >
> > Tejun, does the workqueue code somehow introduce a dependency between
> > different per-CPU workqueues that's not modelled in lockdep?
>
> Hopefully not. Kinda late to the party.

Yeah, sorry, should've included a link:
https://lore.kernel.org/linux-wireless/fa4e82ee-eb14-3930-c76c-f3bd59c5f258@xxxxxxxxxxxxxxx/

> Why isn't mm_percpu_wq making
> forward progress? That should in all circumstances. What's the work item and
> kworker doing?

So it seems that first iwlwifi is holding the RTNL:

ieee80211_open+0x62/0xe0 [mac80211]
__dev_open+0x11a/0x2e0
__dev_change_flags+0x1f8/0x280
netif_change_flags+0x22/0x60
do_setlink.isra.0+0xe57/0x11a0
rtnl_newlink+0x7e8/0xb50

(last stack trace at the above link)
This stuff definitely happens with the RTNL held, although I didn't
check now which function actually acquires it in this stack.

Simultaneously the kworker/6:0 is stuck in reg_todo(), trying to acquire
the RTNL.

So far that seems fairly much normal. The kworker/6:0 running reg_todo()
is from net/wireless/reg.c, reg_work, scheduled to system_percpu_wq (by
simply schedule_work.)

Now iwlwifi is also trying to allocate coherent DMA memory (continuing
the stack trace), potentially a significant chunk for firmware loading:

dma_direct_alloc+0x7b/0x250
dma_alloc_attrs+0xa1/0x2a0
_iwl_pcie_ctxt_info_dma_alloc_coherent+0x31/0xb0 [iwlwifi]
iwl_pcie_ctxt_info_alloc_dma+0x20/0x50 [iwlwifi]
iwl_pcie_init_fw_sec+0x2fc/0x380 [iwlwifi]
iwl_pcie_ctxt_info_v2_alloc+0x19e/0x530 [iwlwifi]
iwl_trans_pcie_gen2_start_fw+0x2e2/0x820 [iwlwifi]
iwl_trans_start_fw+0x77/0x90 [iwlwifi]
iwl_mld_load_fw_wait_alive+0x97/0x2c0 [iwlmld]
iwl_mld_load_fw+0x91/0x240 [iwlmld]
iwl_mld_start_fw+0x44/0x470 [iwlmld]
iwl_mld_mac80211_start+0x3d/0x1b0 [iwlmld]
drv_start+0x6f/0x1d0 [mac80211]
ieee80211_do_open+0x2d6/0x960 [mac80211]
ieee80211_open+0x62/0xe0 [mac80211]

This is fine, but then it gets into __flush_work() in
__lru_add_drain_all():

__flush_work+0x34e/0x530
__lru_add_drain_all+0x19b/0x220
alloc_contig_range_noprof+0x1de/0x8a0
__cma_alloc+0x1f1/0x6a0
__dma_direct_alloc_pages.isra.0+0xcb/0x2f0
dma_direct_alloc+0x7b/0x250

which is because __lru_add_drain_all() schedules a bunch of workers, one
for each CPU, onto the mm_percpu_wq and then waits for them.

Conceptually, I see nothing wrong with this, hence my question; Ben says
that the system stops making progress at this point.

johannes