Re: [patch 4/4] sched/mmcid: Optimize transitional CIDs when scheduling out

From: Thomas Gleixner

Date: Fri Jan 30 2026 - 11:13:36 EST


On Fri, Jan 30 2026 at 10:50, Mathieu Desnoyers wrote:
> On 2026-01-29 16:20, Thomas Gleixner wrote:
>> During the investigation of the various transition mode issues
>> instrumentation revealed that the amount of bitmap operations can be
>> significantly reduced when a task with a transitional CID schedules out
>> after the fixup function completed and disabled the transition mode.
>>
>> At that point the mode is stable and therefore it is not required to drop
>> the transitional CID back into the pool. As the fixup is complete the
>> potential exhaustion of the CID pool is not longer possible, so the CID can
>> be transferred to the scheduling out task or to the CPU depending on the
>> current ownership mode. This is now possible because mm_cid::mode contains
>> both the ownership state and the transition bit so the racy snapshot is
>> valid under all circumstances because a subsequent modification of the
>> mode is serialized by the corresponding runqueue lock.
>
> AFAIU the mc->mode updates are serialized by the mm->mm_cid.lock
> and not the runqueue locks. What am I missing ?

Actually the mode updates are serialized by the mutex. They happen under
the lock as well, but the lock is not a serialization requirement for
mode changes.

What I meant to write with tired brain is:

The racy snapshot is valid under runqueue lock even when there is a
concurrent mode update going on because the subsequent fixup function
is serialized with runqueue lock. That means in the following
scenario:

CPU0 CPU1
clear TRANSIT
....
lock(rq)
sched_out()
CID has TRANSIT set
...
// observes TRANSIT=0
localmode = READ_ONCE(...mode);
// sets TRANSIT
switch mode
transfer CID according to localmode
fixup()
lock(rq) <- Blocked until the schedule on CPU1 is complete

So both sched_out() and fixup() observe consistent state and everything
just works.

Thanks,

tglx