Re: [RFC PATCH v3] sched: Fix performance regression introduced by mm_cid

From: Mathieu Desnoyers
Date: Tue Apr 11 2023 - 08:57:25 EST

Next message: Alexey Kardashevskiy: "[PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap"
Previous message: Konrad Dybcio: "Re: [PATCH v3 2/3] arm64: dts: qcom: Add base qrb4210-rb2 board dts"
In reply to: Mathieu Desnoyers: "Re: [RFC PATCH v3] sched: Fix performance regression introduced by mm_cid"
Next in thread: Peter Zijlstra: "Re: [RFC PATCH v3] sched: Fix performance regression introduced by mm_cid"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2023-04-11 05:37, Peter Zijlstra wrote:

On Fri, Apr 07, 2023 at 09:14:36PM -0400, Mathieu Desnoyers wrote:

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bc0e1cd0d6ac..f3e7dc2cd1cc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3354,6 +3354,37 @@ static inline int mm_cid_get(struct mm_struct *mm)
static inline void switch_mm_cid(struct task_struct *prev, struct task_struct *next)
{
+ /*
+ * Provide a memory barrier between rq->curr store and load of
+ * {prev,next}->mm->pcpu_cid[cpu] on rq->curr->mm transition.
+ *
+ * Should be adapted if context_switch() is modified.
+ */
+ if (!next->mm) { // to kernel
+ /*
+ * user -> kernel transition does not guarantee a barrier, but
+ * we can use the fact that it performs an atomic operation in
+ * mmgrab().
+ */
+ if (prev->mm) // from user
+ smp_mb__after_mmgrab();
+ /*
+ * kernel -> kernel transition does not change rq->curr->mm
+ * state. It stays NULL.
+ */
+ } else { // to user
+ /*
+ * kernel -> user transition does not provide a barrier
+ * between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu].
+ * Provide it here.
+ */
+ if (!prev->mm) // from kernel
+ smp_mb();
+ /*
+ * user -> user transition guarantees a memory barrier through
+ * switch_mm().
+ */

What about the user->user case where next->mm == prev->mm ? There
sys_membarrier() relies on finish_task_switch()'s mmdrop(), but we
can't.

AFAIU the finish_task_switch()'s mmdrop() is for the case where:

* [...] or in
* case 'prev->active_mm == next->mm' through
* finish_task_switch()'s mmdrop().

which applies for the case where we schedule from a kernel thread (which
kept the prior user task's mm as active mm) to a user task with the same
mm.

But this is really a transition from kernel -> user, not user -> user ?

Why should either membarrier or mm_cid care about a transition from
prev->mm to next->mm where mm is unchanged ? It does not register
as a transition from the comparison perspective.

I'll update my comment in switch_mm_cid to:

/*
* user -> user transition guarantees a memory barrier through
* switch_mm() when current->mm changes. If current->mm is
* unchanged, no barrier is needed.
*/

Thanks,

Mathieu

+ }
if (prev->mm_cid_active) {
mm_cid_put_lazy(prev);
prev->mm_cid = -1;

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Next message: Alexey Kardashevskiy: "[PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap"
Previous message: Konrad Dybcio: "Re: [PATCH v3 2/3] arm64: dts: qcom: Add base qrb4210-rb2 board dts"
In reply to: Mathieu Desnoyers: "Re: [RFC PATCH v3] sched: Fix performance regression introduced by mm_cid"
Next in thread: Peter Zijlstra: "Re: [RFC PATCH v3] sched: Fix performance regression introduced by mm_cid"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]