Re: [REGRESSION] 6.12: Workqueue lockups in inode_switch_wbs_work_fn (suspect commit 66c14dccd810)

From: Matt Fleming

Date: Tue Jan 13 2026 - 06:46:38 EST


On Mon, Jan 12, 2026 at 06:04:50PM +0100, Jan Kara wrote:
>
> I agree we are CPU bound in inode_switch_wbs_work_fn() but I don't think we
> are really hogging the CPU. The backtrace below indicates the worker just
> got rescheduled in cond_resched() to give other tasks a chance to run. Is
> the machine dying completely or does it eventually finish the cgroup
> teardown?

Yeah you're right, the CPU isn't hogged but the interaction with the
workqueue subsystem leads to the machine choking. I've seen 150+
instances of inode_switch_wbs_work_fn() queued up in the workqueue
subsystem:

[1437017.446174][ C0] in-flight: 3139338:inode_switch_wbs_work_fn ,2420392:inode_switch_wbs_work_fn ,2914179:inode_switch_wbs_work_fn
[1437017.446181][ C0] pending: 11*inode_switch_wbs_work_fn
[1437017.446185][ C0] pwq 6: cpus=1 node=0 flags=0x2 nice=0 active=23 refcnt=24
[1437017.446186][ C0] in-flight: 2723771:inode_switch_wbs_work_fn ,1710617:inode_switch_wbs_work_fn ,3228683:inode_switch_wbs_work_fn ,3149692:inode_switch_wbs_work_fn ,3224195:inode_switch_wbs_work_fn
[1437017.446193][ C0] pending: 18*inode_switch_wbs_work_fn
[1437017.446195][ C0] pwq 10: cpus=2 node=0 flags=0x2 nice=0 active=17 refcnt=18
[1437017.446196][ C0] in-flight: 3224135:inode_switch_wbs_work_fn ,3193118:inode_switch_wbs_work_fn ,3224106:inode_switch_wbs_work_fn ,3228725:inode_switch_wbs_work_fn ,3087195:inode_switch_wbs_work_fn ,1853835:inode_switch_wbs_work_fn
[1437017.446204][ C0] pending: 11*inode_switch_wbs_work_fn

It sometimes finishes the cgroup teardown and sometimes hard locks up.
When workqueue items aren't completing things get really bad :)

> Well, these changes were introduced because some services are switching
> over 1m inodes on their exit and they were softlocking up the machine :).
> So there's some commonality, just something in that setup behaves
> differently from your setup. Are the inodes clean, dirty, or only with
> dirty timestamps?

Good question. I don't know but I'll get back to you.

> Also since you mention 6.12 kernel but this series was
> only merged in 6.18, do you carry full series ending with merge commit
> 9426414f0d42f?

We always run the latest 6.12 LTS release and it looks like only these
two commits got backported:

9a6ebbdbd412 ("writeback: Avoid excessively long inode switching times")
66c14dccd810 ("writeback: Avoid softlockup when switching many inodes")