Re: [PATCH] locking/rwlocks: do not starve writers

From: Waiman Long
Date: Fri Jun 17 2022 - 14:58:12 EST


On 6/17/22 13:45, Eric Dumazet wrote:
On Fri, Jun 17, 2022 at 7:42 PM Waiman Long <longman@xxxxxxxxxx> wrote:
On 6/17/22 11:24, Eric Dumazet wrote:
On Fri, Jun 17, 2022 at 5:00 PM Waiman Long <longman@xxxxxxxxxx> wrote:
On 6/17/22 10:57, Shakeel Butt wrote:
On Fri, Jun 17, 2022 at 7:43 AM Waiman Long <longman@xxxxxxxxxx> wrote:
On 6/17/22 08:07, Peter Zijlstra wrote:
On Fri, Jun 17, 2022 at 02:10:39AM -0700, Eric Dumazet wrote:
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -23,16 +23,6 @@ void queued_read_lock_slowpath(struct qrwlock *lock)
/*
* Readers come here when they cannot get the lock without waiting
*/
- if (unlikely(in_interrupt())) {
- /*
- * Readers in interrupt context will get the lock immediately
- * if the writer is just waiting (not holding the lock yet),
- * so spin with ACQUIRE semantics until the lock is available
- * without waiting in the queue.
- */
- atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED));
- return;
- }
atomic_sub(_QR_BIAS, &lock->cnts);

trace_contention_begin(lock, LCB_F_SPIN | LCB_F_READ);
This is known to break tasklist_lock.

We certainly can't break the current usage of tasklist_lock.

I am aware of this problem with networking code and is thinking about
either relaxing the check to exclude softirq or provide a
read_lock_unfair() variant for networking use.
read_lock_unfair() for networking use or tasklist_lock use?
I mean to say read_lock_fair(), but it could also be the other way
around. Thanks for spotting that.

If only tasklist_lock is problematic and needs the unfair variant,
then changing a few read_lock() for tasklist_lock will be less
invasive than ~1000 read_lock() elsewhere....
After a second thought, I think the right way is to introduce a fair
variant, if needed. If an arch isn't using qrwlock, the native rwlock
implementation will be unfair. In that sense, unfair rwlock is the
default. We will only need to change the relevant network read_lock()
calls to use the fair variant which will still be unfair if qrwlock
isn't used. We are not going to touch other read_lock call that don't
care about fair or unfair.

Hmm... backporting this kind of invasive change to stable kernels will
be a daunting task.

Were rwlocks always unfair, and we have been lucky ?

Yes, rwlocks was always unfair and it always had this kind of soft lockup problem and scalability problem because of cacheline bouncing. That was reason of creating qrwlock which can at least provide a fair rwlock at task context. Now we have systems with more and more cpus and that is the reason why you are seeing it all over again with the networking code.

Cheers,
Longman