Re: [PATCH v2] locking/percpu-rwsem: Optimize readers and reduce global impact

From: Peter Zijlstra
Date: Wed Aug 10 2016 - 15:12:49 EST


On Tue, Aug 09, 2016 at 04:47:38PM -0700, John Stultz wrote:
> On Tue, Aug 9, 2016 at 2:51 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > Currently the percpu-rwsem switches to (global) atomic ops while a
> > writer is waiting; which could be quite a while and slows down
> > releasing the readers.
> >
> > This patch cures this problem by ordering the reader-state vs
> > reader-count (see the comments in __percpu_down_read() and
> > percpu_down_write()). This changes a global atomic op into a full
> > memory barrier, which doesn't have the global cacheline contention.
> >
> > This also enables using the percpu-rwsem with rcu_sync disabled in order
> > to bias the implementation differently, reducing the writer latency by
> > adding some cost to readers.
>
> So this by itself doesn't help us much, but including the following
> from Oleg does help quite a bit:

Correct, Oleg was going to send his rcu_sync rework on top of this. But
since its holiday season things might be tad delayed.