Re: [RFC PATCH 1/2] Fix: sched/membarrier: p->mm->membarrier_state racy load

From: Peter Zijlstra
Date: Tue Sep 03 2019 - 16:25:04 EST


On Tue, Sep 03, 2019 at 04:11:34PM -0400, Mathieu Desnoyers wrote:

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 9f51932bd543..e24d52a4c37a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1130,6 +1130,10 @@ struct task_struct {
> unsigned long numa_pages_migrated;
> #endif /* CONFIG_NUMA_BALANCING */
>
> +#ifdef CONFIG_MEMBARRIER
> + atomic_t membarrier_state;
> +#endif
> +
> #ifdef CONFIG_RSEQ
> struct rseq __user *rseq;
> u32 rseq_sig;
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 4a7944078cc3..3577cd7b3dbb 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -371,7 +371,17 @@ static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
> static inline void membarrier_execve(struct task_struct *t)
> {
> atomic_set(&t->mm->membarrier_state, 0);
> + atomic_set(&t->membarrier_state, 0);
> }
> +
> +static inline void membarrier_prepare_task_switch(struct task_struct *t)
> +{
> + if (!t->mm)
> + return;
> + atomic_set(&t->membarrier_state,
> + atomic_read(&t->mm->membarrier_state));
> +}
> +

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 010d578118d6..8d4f1f20db15 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3038,6 +3038,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
> perf_event_task_sched_out(prev, next);
> rseq_preempt(prev);
> fire_sched_out_preempt_notifiers(prev, next);
> + membarrier_prepare_task_switch(next);
> prepare_task(next);
> prepare_arch_switch(next);
> }


Yuck yuck yuck..

so the problem I have with this is that we add yet another cacheline :/

Why can't we frob this state into a line/word we already have to
unconditionally touch, like the thread_info::flags word for example.

The above also does the store unconditionally, even though, in the most
common case, it won't have to.