Re: [PATCH 5/5] locking/percpu-rwsem: Remove the embedded rwsem

From: Peter Zijlstra
Date: Tue Dec 17 2019 - 05:28:22 EST


On Tue, Dec 17, 2019 at 11:26:54AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 19, 2019 at 04:58:26PM +0100, Oleg Nesterov wrote:
> > On 11/19, Waiman Long wrote:
> > >
> > > On 11/13/19 5:21 AM, Peter Zijlstra wrote:
> > > > +static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entry,
> > > > + unsigned int mode, int wake_flags,
> > > > + void *key)
> > > > +{
> > > > + struct task_struct *p = get_task_struct(wq_entry->private);
> > > > + bool reader = wq_entry->flags & WQ_FLAG_CUSTOM;
> > > > + struct percpu_rw_semaphore *sem = key;
> > > > +
> > > > + /* concurrent against percpu_down_write(), can get stolen */
> > > > + if (!__percpu_rwsem_trylock(sem, reader))
> > > > + return 1;
> > > > +
> > > > + list_del_init(&wq_entry->entry);
> > > > + smp_store_release(&wq_entry->private, NULL);
> > > > +
> > > > + wake_up_process(p);
> > > > + put_task_struct(p);
> > > > +
> > > > + return !reader; /* wake 'all' readers and 1 writer */
> > > > +}
> > > > +
> > >
> > > If I read the function correctly, you are setting the WQ_FLAG_EXCLUSIVE
> > > for both readers and writers and __wake_up() is called with an exclusive
> > > count of one. So only one reader or writer is woken up each time.
> >
> > This depends on what percpu_rwsem_wake_function() returns. If it returns 1,
> > __wake_up_common() stops, exactly because all waiters have WQ_FLAG_EXCLUSIVE.
>
> Indeed, let me see if I can clarify that somehow.
>
> > > However, the comment above said we wake 'all' readers and 1 writer. That
> > > doesn't match the actual code, IMO.
> >
> > Well, "'all' readers" probably means "all readers before writer",
>
> Correct.

Does this clarify?

--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -101,6 +101,19 @@ static bool __percpu_rwsem_trylock(struc
return __percpu_down_write_trylock(sem);
}

+/*
+ * The return value of wait_queue_entry::func means:
+ *
+ * <0 - error, wakeup is terminated and the error is returned
+ * 0 - no wakeup, a next waiter is tried
+ * >0 - woken, if EXCLUSIVE, counted towards @nr_exclusive.
+ *
+ * We use EXCLUSIVE for both readers and writers to preserve FIFO order,
+ * and play games with the return value to allow waking multiple readers.
+ *
+ * Specifically, we wake readers until we've woken a single writer, or until a
+ * trylock fails.
+ */
static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entry,
unsigned int mode, int wake_flags,
void *key)
@@ -119,7 +132,7 @@ static int percpu_rwsem_wake_function(st
wake_up_process(p);
put_task_struct(p);

- return !reader; /* wake 'all' readers and 1 writer */
+ return !reader; /* wake (readers until) 1 writer */
}

static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader)