Re: [patch 1/4] sched/mmcid: Prevent live lock on task to CPU mode transition

From: Mathieu Desnoyers

Date: Fri Jan 30 2026 - 10:25:02 EST


On 2026-01-29 16:20, Thomas Gleixner wrote:
Ihor reported a BPF CI failure which turned out to be a live lock in the
MM_CID management. The scenario is:

A test program creates the 4th child, which means the MM_CID users become

It would be clearer to talk in terms of threads, e.g. "creates the 5th
thread". AFAIR threads are "siblings", so I'm not sure that the
parent/child relationship really applies here.

more than the number of CPUs (four in this example), so it switches to per
CPU ownership mode.

At this point each live task of the program has a CID associated. Assume
thread creation order assignment for simplicity.

T0 (main thread) CID0 runs fork() and creates T4
T1 (1st child) CID1

2nd thread and so on...

T2 (2nd child) CID2
T3 (3rd child) CID3
T4 (4th child) --- not visible yet

T0 sets mm_cid::percpu = true and transfers it's own CID to CPU0 where it

its

runs on and then starts the fixup which walks through the threads to
transfer the per task CIDs either to the CPU the task is running on or drop
it back into the pool if the task is not on a CPU.

During that T1 - T3 are free to schedule in and out before the fixup caught
up with them. Going through all possible permutations with a python script
revealed a few problematic cases. The most trivial one is:

T1 schedules in on CPU1 and observes percpu == true, so it transfers
it's CID to CPU1

its


T1 is migrated to CPU1 and schedule in observes percpu == true, but

I think you mean "to CPU2" here.

CPU2 does not have a CID associated and T1 transferred it's own to

its

[...]
+ *
+ * Aside of that this mechanism also ensures RT compability:

compatibility

[...]
@@ -10596,11 +10628,13 @@ void sched_mm_cid_fork(struct task_struc
if (!percpu)
mm_cid_transit_to_task(current, pcp);
else
- mm_cid_transfer_to_cpu(current, pcp);
+ mm_cid_transit_to_cpu(current, pcp);
}
if (percpu) {
mm_cid_fixup_tasks_to_cpus();
+ /* Clear the transition bit */
+ WRITE_ONCE(mm->mm_cid.transit, 0);

You should move this WRITE_ONCE at the end of
mm_cid_fixup_tasks_to_cpus() to keep the same pattern as for
mm_cid_fixup_cpus_to_tasks().

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com