Re: [patch V5 00/20] sched: Rewrite MM CID management

From: Shrikanth Hegde

Date: Wed Jan 28 2026 - 08:05:19 EST




On 1/28/26 5:27 PM, Thomas Gleixner wrote:
On Tue, Jan 27 2026 at 16:01, Ihor Solodrai wrote:
BPF CI caught a deadlock on current bpf-next tip (35538dba51b4).
Job: https://github.com/kernel-patches/bpf/actions/runs/21417415035/job/61670254640

It appears to be related to this series. Pasting a splat below.

The deadlock splat is completely unrelated as it is a consequence of the
panic which is triggered by the watchdog:

[ 45.009755] watchdog: CPU2: Watchdog detected hard LOCKUP on cpu 2

...

[ 46.053170] lock(&nmi_desc[NMI_LOCAL].lock);
[ 46.053172] <Interrupt>
[ 46.053173] lock(&nmi_desc[NMI_LOCAL].lock);

...

Any ideas what might be going on?

Without a full backtrace of all CPUs it's hard to tell because it's
unclear what is holding the runqueue lock of CPU2 long enough to trigger
the hard lockup watchdog.

I'm pretty sure the CID changes are unrelated, that new code just happen
to show up as the messenger which gets stuck on the lock forever.

[ 46.053209] CPU: 2 UID: 0 PID: 126 Comm: test_progs Tainted: G OE 6.19.0-rc5-g748c6d52700a-dirty #1 PREEMPT(full)
[ 46.053214] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 46.053215] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 46.053217] Call Trace:
[ 46.053220] <NMI>
[ 46.053223] dump_stack_lvl+0x5d/0x80
[ 46.053227] print_usage_bug.part.0+0x22b/0x2c0
[ 46.053231] lock_acquire+0x272/0x2b0
[ 46.053235] ? __register_nmi_handler+0x83/0x350
[ 46.053240] _raw_spin_lock_irqsave+0x39/0x60
[ 46.053242] ? __register_nmi_handler+0x83/0x350
[ 46.053246] __register_nmi_handler+0x83/0x350
[ 46.053250] native_stop_other_cpus+0x31c/0x460
[ 46.053255] ? __pfx_native_stop_other_cpus+0x10/0x10
[ 46.053260] vpanic+0x1c5/0x3f0

vpanic() really should disable lockdep here before taking that lock in
NMI context. The resulting lockdep splat is not really useful.

Thanks.

tglx

Hi Thomas, Peter.


I remember running into this panic, once. But it wasn't consistent and i
couldn't hit it again. And it had vcpu overcommit, and fair bit of steal time.


The trace was like below from different CPUs.
------------------------

watchdog: CPU 23 self-detected hard LOCKUP @ mm_get_cid+0xe8/0x188
watchdog: CPU 23 TB:1434903268401795, last heartbeat TB:1434897252302837 (11750ms ago)
NIP [c0000000001b7134] mm_get_cid+0xe8/0x188
LR [c0000000001b7154] mm_get_cid+0x108/0x188
Call Trace:
[c000000004c37db0] [c000000001145d84] cpuidle_enter_state+0xf8/0x6a4 (unreliable)
[c000000004c37e00] [c0000000001b95ac] mm_cid_switch_to+0x3c4/0x52c
[c000000004c37e60] [c000000001147264] __schedule+0x47c/0x700
[c000000004c37ee0] [c000000001147a70] schedule_idle+0x3c/0x64
[c000000004c37f10] [c0000000001f6d70] do_idle+0x160/0x1b0
[c000000004c37f60] [c0000000001f7084] cpu_startup_entry+0x48/0x50
[c000000004c37f90] [c00000000005f570] start_secondary+0x284/0x288
[c000000004c37fe0] [c00000000000e158] start_secondary_prolog+0x10/0x14


watchdog: CPU 11 self-detected hard LOCKUP @ plpar_hcall_norets_notrace+0x18/0x2c
watchdog: CPU 11 TB:1434903340004919, last heartbeat TB:1434897249749892 (11895ms ago)
NIP [c0000000000f84fc] plpar_hcall_norets_notrace+0x18/0x2c
LR [c000000001152588] queued_spin_lock_slowpath+0xd88/0x15d0
Call Trace:
[c00000056b69fb10] [c00000056b69fba0] 0xc00000056b69fba0 (unreliable)
[c00000056b69fc30] [c000000001153ce0] _raw_spin_lock+0x80/0xa0
[c00000056b69fc50] [c0000000001b9a34] raw_spin_rq_lock_nested+0x3c/0xf8
[c00000056b69fc80] [c0000000001b9bb8] mm_cid_fixup_cpus_to_tasks+0xc8/0x28c
[c00000056b69fd00] [c0000000001bff34] sched_mm_cid_exit+0x108/0x22c
[c00000056b69fd40] [c000000000167b08] do_exit+0xf4/0x5d0
[c00000056b69fdf0] [c00000000016800c] make_task_dead+0x0/0x178
[c00000056b69fe10] [c0000000000316c8] system_call_exception+0x128/0x390
[c00000056b69fe50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec


watchdog: CPU 65 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x10ec/0x15d0
watchdog: CPU 65 TB:1434905824977447, last heartbeat TB:1434899309522065 (12725ms ago)
NIP [c0000000011528ec] queued_spin_lock_slowpath+0x10ec/0x15d0
LR [c000000001152d0c] queued_spin_lock_slowpath+0x150c/0x15d0
Call Trace:
[c000000777e27a60] [0000000000000009] 0x9 (unreliable)
[c000000777e27b80] [c000000001153ce0] _raw_spin_lock+0x80/0xa0
[c000000777e27ba0] [c0000000001b9a34] raw_spin_rq_lock_nested+0x3c/0xf8
[c000000777e27bd0] [c0000000001babb8] ___task_rq_lock+0x64/0x140
[c000000777e27c20] [c0000000001c8294] wake_up_new_task+0x180/0x484
[c000000777e27ca0] [c00000000015bea4] kernel_clone+0x120/0x5bc
[c000000777e27d30] [c00000000015c4c0] __do_sys_clone+0x88/0xc8
[c000000777e27e10] [c0000000000316c8] system_call_exception+0x128/0x390
[c000000777e27e50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec




I am wondering if it this loop in mm_get_cid, which may not be getting a cid
for a long time? Is that possible?

static inline unsigned int mm_get_cid(struct mm_struct *mm)
{
unsigned int cid = __mm_get_cid(mm, READ_ONCE(mm->mm_cid.max_cids));

while (cid == MM_CID_UNSET) {
cpu_relax();
cid = __mm_get_cid(mm, num_possible_cpus());
}
return cid;
}