Re: [PATCH 1/2] locking/percpu-rwsem: Optimize readers and reduce global impact

From: Peter Zijlstra
Date: Fri Jul 15 2016 - 15:47:15 EST


On Fri, Jul 15, 2016 at 06:30:54PM +0200, Oleg Nesterov wrote:
> On 07/14, Peter Zijlstra wrote:
> >
> > Currently the percpu-rwsem switches to (global) atomic ops while a
> > writer is waiting; which could be quite a while and slows down
> > releasing the readers.
> >
> > This patch cures this problem by ordering the reader-state vs
> > reader-count (see the comments in __percpu_down_read() and
> > percpu_down_write()). This changes a global atomic op into a full
> > memory barrier, which doesn't have the global cacheline contention.
>
> I've applied this patch + another change you sent on top of it.
>
> Everything looks good to me except the __this_cpu_inc() in
> __percpu_down_read(),
>
> > + __down_read(&sem->rw_sem);
> > + __this_cpu_inc(*sem->read_count);
> > + __up_read(&sem->rw_sem);
>
> Preemption is already enabled, don't we need this_cpu_inc() ?

Ah indeed. This mistake is quite old it seems, good catch.

> > -EXPORT_SYMBOL_GPL(percpu_up_write);
> > +EXPORT_SYMBOL(percpu_up_write);
>
> and this one ;) I do not really care, but it seems you did this change
> by accident.

Yep, oops ;-)

> Actually, I _think_ we can do some cleanups/improvements on top of this
> change, but we can do this later. In particular, _perhaps_ we can avoid
> the unconditional wakeup in __percpu_up_read(), but I am not sure and in
> any case this needs another change.
>
> Reviewed-by: Oleg Nesterov <oleg@xxxxxxxxxx>

Thanks!