Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Peter Zijlstra
Date: Wed Dec 02 2020 - 07:46:33 EST
On Wed, Dec 02, 2020 at 12:17:31PM +0100, Peter Zijlstra wrote:
> So the obvious 'improvement' here would be something like:
>
> for_each_online_cpu(cpu) {
> p = rcu_dereference(cpu_rq(cpu)->curr;
> if (p->active_mm != mm)
> continue;
> __cpumask_set_cpu(cpu, tmpmask);
> }
> on_each_cpu_mask(tmpmask, ...);
>
> The remote CPU will never switch _to_ @mm, on account of it being quite
> dead, but it is quite prone to false negatives.
>
> Consider that __schedule() sets rq->curr *before* context_switch(), this
> means we'll see next->active_mm, even though prev->active_mm might still
> be our @mm.
>
> Now, because we'll be removing the atomic ops from context_switch()'s
> active_mm swizzling, I think we can change this to something like the
> below. The hope being that the cost of the new barrier can be offset by
> the loss of the atomics.
>
> Hmm ?
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 41404afb7f4c..2597c5c0ccb0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4509,7 +4509,6 @@ context_switch(struct rq *rq, struct task_struct *prev,
> if (!next->mm) { // to kernel
> enter_lazy_tlb(prev->active_mm, next);
>
> - next->active_mm = prev->active_mm;
> if (prev->mm) // from user
> mmgrab(prev->active_mm);
> else
> @@ -4524,6 +4523,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
> * case 'prev->active_mm == next->mm' through
> * finish_task_switch()'s mmdrop().
> */
> + next->active_mm = next->mm;
> switch_mm_irqs_off(prev->active_mm, next->mm, next);
I think that next->active_mm store should be after switch_mm(),
otherwise we still race.
>
> if (!prev->mm) { // from kernel
> @@ -5713,11 +5713,9 @@ static void __sched notrace __schedule(bool preempt)
>
> if (likely(prev != next)) {
> rq->nr_switches++;
> - /*
> - * RCU users of rcu_dereference(rq->curr) may not see
> - * changes to task_struct made by pick_next_task().
> - */
> - RCU_INIT_POINTER(rq->curr, next);
> +
> + next->active_mm = prev->active_mm;
> + rcu_assign_pointer(rq->curr, next);
> /*
> * The membarrier system call requires each architecture
> * to have a full memory barrier after updating