[PATCH] locking/rwlocks: do not starve writers

From: Eric Dumazet
Date: Fri Jun 17 2022 - 05:10:48 EST


From: Eric Dumazet <edumazet@xxxxxxxxxx>

Networking is still using rwlocks and read_lock() is called
from softirq context, potentially from many cpus.

In this (soft)irq context, rwlock code is unfair to writers
and can cause soft lockups.

We first noticed an issue with epoll after commit a218cc491420
("epoll: use rwlock in order to reduce ep_poll_callback() contention"),
but it is trivial to brick a host using this repro:

for i in {1..48}
do
ping -f -n -q 127.0.0.1 &
sleep 0.1
done

If really an unfair version of rwlocks is needed, we should introduce
a new read_lock_unfair().

[ 673.678717][ C34] watchdog: BUG: soft lockup - CPU#34 stuck for 82s! [ping:17794]
[ 673.700713][ C45] watchdog: BUG: soft lockup - CPU#45 stuck for 82s! [ping:17796]
[ 673.702712][ C46] watchdog: BUG: soft lockup - CPU#46 stuck for 78s! [ping:17802]
[ 673.704712][ C47] watchdog: BUG: soft lockup - CPU#47 stuck for 82s! [ping:17798]
[ 677.636023][ C13] watchdog: BUG: soft lockup - CPU#13 stuck for 82s! [ping:17804]
[ 677.638022][ C14] watchdog: BUG: soft lockup - CPU#14 stuck for 75s! [ping:17825]
[ 677.644021][ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 75s! [ping:17821]
[ 677.650020][ C20] watchdog: BUG: soft lockup - CPU#20 stuck for 82s! [ping:17800]
[ 677.686014][ C38] watchdog: BUG: soft lockup - CPU#38 stuck for 75s! [ping:17819]
[ 681.691318][ C41] watchdog: BUG: soft lockup - CPU#41 stuck for 74s! [ping:17823]
[ 684.657807][ C46] rcu: INFO: rcu_sched self-detected stall on CPU
[ 684.664075][ C46] rcu: 46-....: (1 GPs behind) idle=529/1/0x4000000000000000 softirq=22717/22717 fqs=20200
[ 705.633252][ C14] watchdog: BUG: soft lockup - CPU#14 stuck for 101s! [ping:17825]
[ 706.999058][ T309] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 14-... 41-... } 88575 jiffies s: 2325 root: 0x5/.
[ 706.999069][ T309] rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x4000/. l=1:32-47:0x200/.
[ 709.686574][ C41] watchdog: BUG: soft lockup - CPU#41 stuck for 100s! [ping:17823]
[ 714.457782][ C41] rcu: INFO: rcu_sched self-detected stall on CPU
[ 714.464047][ C41] rcu: 41-....: (1 GPs behind) idle=403/1/0x4000000000000000 softirq=24654/24655 fqs=4653

Fixes: 70af2f8a4f48 ("locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks")
Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
Cc: Roman Penyaev <rpenyaev@xxxxxxx>
Cc: Shakeel Butt <shakeelb@xxxxxxxxxx>
---
kernel/locking/qrwlock.c | 10 ----------
1 file changed, 10 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 2e1600906c9f5cd868415d20e2d7024c5b1e0531..bf64d14f0fc88613363c3c007bca8c0918709123 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -23,16 +23,6 @@ void queued_read_lock_slowpath(struct qrwlock *lock)
/*
* Readers come here when they cannot get the lock without waiting
*/
- if (unlikely(in_interrupt())) {
- /*
- * Readers in interrupt context will get the lock immediately
- * if the writer is just waiting (not holding the lock yet),
- * so spin with ACQUIRE semantics until the lock is available
- * without waiting in the queue.
- */
- atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED));
- return;
- }
atomic_sub(_QR_BIAS, &lock->cnts);

trace_contention_begin(lock, LCB_F_SPIN | LCB_F_READ);
--
2.36.1.476.g0c4daa206d-goog