Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier

From: Mathieu Desnoyers
Date: Sun Jan 10 2010 - 13:26:58 EST


* Paul E. McKenney (paulmck@xxxxxxxxxxxxxxxxxx) wrote:
> On Sun, Jan 10, 2010 at 06:50:09AM -0500, Steven Rostedt wrote:
> > On Sat, 2010-01-09 at 21:25 -0800, Paul E. McKenney wrote:
> > > On Sat, Jan 09, 2010 at 09:12:58PM -0500, Steven Rostedt wrote:
> >
> > > > > < user space >
> > > > >
> > > > > < misses that CPU 2 is in rcu section >
> > > >
> > > >
> > > > If the TLB flush misses that CPU 2 has a threaded task, and does not
> > > > flush CPU 2s TLB, it can also risk the same type of crash.
> > >
> > > But isn't the VM's locking helping us out in that case?
> > >
> > > > > [CPU 2's ->curr update now visible]
> > > > >
> > > > > [CPU 2's rcu_read_lock() store now visible]
> > > > >
> > > > > free(obj);
> > > > >
> > > > > use_object(obj); <=== crash!
> > > > >
> > > >
> > > > Think about it. If you change a process mmap, say you updated a mmap of
> > > > a file by flushing out one page and replacing it with another. If the
> > > > above missed sending to CPU 2, then CPU 2 may still be accessing the old
> > > > page of the file, and not the new one.
> > > >
> > > > I think this may be the safe bet.
> > >
> > > You might well be correct that we can access that bitmap locklessly,
> > > but there are additional things (like the loading of the arch-specific
> > > page-table register) that are likely to be helping in the VM case, but
> > > not necessarily helping in this case.
> >
> > Then perhaps the sys_membarrier() should just do a flush_tlb()? That
> > should guarantee the synchronization, right?
>
> Isn't just grabbing the runqueue locks a bit more straightforward and
> more clearly correct? Again, this is the URCU slowpath, so it is hard
> to get too excited about a 15% performance penalty.
>

Even when taking the spinlocks, efficient iteration on active threads is
done with for_each_cpu(cpu, mm_cpumask(current->mm)), which depends on
the same cpumask, and thus requires the same memory barriers around the
updates.

We could switch to an inefficient iteration on all online CPUs instead,
and check read runqueue ->mm with the spinlock held. Is that what you
propose ? This will cause reading of large amounts of runqueue
information, especially on large systems running few threads. The other
way around is to iterate on all the process threads: in this case, small
systems running many threads will have to read information about many
inactive threads, which is not much better.

Thanks,

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/