Re: [RFC PATCH 4/4] Fix: sched/membarrier: p->mm->membarrier_state racy load (v2)

From: Will Deacon
Date: Thu Sep 12 2019 - 09:48:12 EST


On Tue, Sep 10, 2019 at 05:48:02AM -0400, Mathieu Desnoyers wrote:
> ----- On Sep 8, 2019, at 5:51 PM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx wrote:
>
> > On Sun, Sep 8, 2019 at 6:49 AM Mathieu Desnoyers
> > <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> >>
> >> +static void sync_runqueues_membarrier_state(struct mm_struct *mm)
> >> +{
> >> + int membarrier_state = atomic_read(&mm->membarrier_state);
> >> + bool fallback = false;
> >> + cpumask_var_t tmpmask;
> >> +
> >> + if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
> >> + /* Fallback for OOM. */
> >> + fallback = true;
> >> + }
> >> +
> >> + /*
> >> + * For each cpu runqueue, if the task's mm match @mm, ensure that all
> >> + * @mm's membarrier state set bits are also set in in the runqueue's
> >> + * membarrier state. This ensures that a runqueue scheduling
> >> + * between threads which are users of @mm has its membarrier state
> >> + * updated.
> >> + */
> >> + cpus_read_lock();
> >> + rcu_read_lock();
> >> + for_each_online_cpu(cpu) {
> >> + struct rq *rq = cpu_rq(cpu);
> >> + struct task_struct *p;
> >> +
> >> + p = task_rcu_dereference(&rq->curr);
> >> + if (p && p->mm == mm) {
> >> + if (!fallback)
> >> + __cpumask_set_cpu(cpu, tmpmask);
> >> + else
> >> + smp_call_function_single(cpu, ipi_sync_rq_state,
> >> + mm, 1);
> >> + }
> >> + }
> >
> > I really absolutely detest this whole "fallback" code.
> >
> > It will never get any real testing, and the code is just broken.
> >
> > Why don't you just use the mm_cpumask(mm) unconditionally? Yes, it
> > will possibly call too many CPU's, but this fallback code is just
> > completely disgusting.
> >
> > Do a simple and clean implementation. Then, if you can show real
> > performance issues (which I doubt), maybe do something else, but even
> > then you should never do something that will effectively create cases
> > that have absolutely zero test-coverage.
>
> A few points worth mentioning here:
>
> 1) As I stated earlier, using mm_cpumask in its current form is not
> an option for membarrier. For two reasons:
>
> A) The mask is not populated on all architectures (e.g. arm64 does
> not populate it),
>
> B) Even if it was populated on all architectures, we would need to
> carefully audit and document every spot where this mm_cpumask
> is set or cleared within each architecture code, and ensure we
> have the required memory barriers between user-space memory
> accesses and those stores, documenting those requirements into
> each architecture code in the process. This seems to be a lot of
> useless error-prone code churn.
>
> 2) I should actually use GFP_KERNEL rather than GFP_NOWAIT in this
> membarrier registration code. But it can still fail. However, the other
> membarrier code using the same fallback pattern (private and global
> expedited) documents that those membarrier commands do not block in
> the membarrier(2) man page, so GFP_NOWAIT is appropriate in those cases.
>
> 3) Testing-wise, I fully agree with your argument of lacking test coverage.
> One option I'm considering would be to add a selftest based on the
> fault-injection infrastructure, which would ensure that we have coverage
> of the failure case in the kernel selftests.
>
> Thoughts ?

So the man page for sys_membarrier states that the expedited variants "never
block", which feels pretty strong. Do any other system calls claim to
provide this guarantee without a failure path if blocking is necessary?
Given that the whole thing is preemptible, I'm also curious as to how
exactly userspace relies on this non-blocking guarantee. I'd have thought
that you could either just bite the bullet and block in the rare case that
you need to when allocating the cpumask, or you could just return
-EWOULDBLOCK on allocation failure, given that I suspect there are very few
users of this system call right now and it's not yet supported by glibc
afaik.

Will