Re: [patch V2 0/4] sched/mmcid: Cure mode transition woes

From: Mathieu Desnoyers

Date: Mon Feb 02 2026 - 16:23:00 EST

On 2026-02-02 07:54, Peter Zijlstra wrote:

On Mon, Feb 02, 2026 at 06:46:34AM -0500, Mathieu Desnoyers wrote:

On 2026-02-02 05:14, Peter Zijlstra wrote:

On Mon, Feb 02, 2026 at 10:39:35AM +0100, Thomas Gleixner wrote:

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10445,6 +10445,12 @@ static bool mm_update_max_cids(struct mm
/* Flip the mode and set the transition flag to bridge the transfer */
WRITE_ONCE(mc->mode, mc->mode ^ (MM_CID_TRANSIT | MM_CID_ONCPU));
+ /*
+ * Order the store against the subsequent fixups so that
+ * acquire(rq::lock) cannot be reordered by the CPU before the
+ * store.
+ */
+ smp_mb();
return true;
}
@@ -10487,6 +10493,16 @@ static inline void mm_update_cpus_allowe
irq_work_queue(&mc->irq_work);
}
+static inline void mm_cid_complete_transit(struct mm_struct *mm, unsigned int mode)
+{
+ /*
+ * Ensure that the store removing the TRANSIT bit cannot be
+ * reordered by the CPU before the fixups have been completed.
+ */
+ smp_mb();
+ WRITE_ONCE(mm->mm_cid.mode, mode);
+}

I think this could've been smp_store_release(), but this is the slow
path so nobody cares and this is nicely symmetric.

I'm not sure the store-release would work here. What load-acquire
would it pair with ?

The purpose here -- per the comment is to ensure the fixup stuff is
visible before the TRANSIT bit goes 0, store-release ensures that.

That pairs with whatever cares about this barrier now.

Now that I think about it some more, I think my advice about adding
smp_mb() before rq lock/after rq unlock was wrong.

The store setting transit will be ordered by rq _unlock_ which acts
as a release barrier, and store clearing transit will be ordered by
rq _lock_ acting as an acquire barrier. Those pair with the respective
rq unlock/lock in the scheduler.

So AFAIU we can remove those two useless smp_mb().

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com