Re: [PATCH RFC v3 1/3] sbitmap: fix that same waitqueue can be woken up continuously

From: Yu Kuai
Date: Tue Jul 12 2022 - 09:25:35 EST


在 2022/07/11 22:20, Jan Kara 写道:
On Sun 10-07-22 12:21:58, Yu Kuai wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>

__sbq_wake_up __sbq_wake_up
sbq_wake_ptr -> assume 0
sbq_wake_ptr -> 0
atomic_cmpxchg -> succeed
atomic_cmpxchg -> failed
return true

atomic_read(&sbq->wake_index) -> still 0
sbq_index_atomic_inc -> inc to 1
if (waitqueue_active(&ws->wait))
if (wake_index != atomic_read(&sbq->wake_index))
atomic_set -> reset from 1 to 0
wake_up_nr -> wake up first waitqueue
// continue to wake up in first waitqueue

Fix the problem by using atomic_cmpxchg() instead of atomic_set()
to update 'wake_index'.

Fixes: 417232880c8a ("sbitmap: Replace cmpxchg with xchg")
Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>

I don't think this patch is really needed after the following patches. As
I see it, wake_index is just a performance optimization (plus a fairness
improvement) but in principle the code in sbq_wake_ptr() is always prone to
races as the waitqueue it returns needn't have any waiters by the time we
return. So for correctness the check-and-retry loop needs to happen at
higher level than inside sbq_wake_ptr() and occasional wrong setting of
wake_index will result only in a bit of unfairness and more scanning
looking for suitable waitqueue but I don't think that really justifies the
cost of atomic operations in cmpxchg loop...

It's right this patch just improve fairness. However, in hevyload tests
I found that the 'wrong setting of wake_index' can happen frequently,
for consequence, some waitqueue can be empty while some waitqueue have
a lot of waiters.

There shoud be lots of work to fix unfairness throughly, I can remove
this patch for now.


lib/sbitmap.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 29eb0484215a..b46fce1beb3a 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -579,19 +579,24 @@ EXPORT_SYMBOL_GPL(sbitmap_queue_min_shallow_depth);
static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq)
- int i, wake_index;
+ int i, wake_index, old_wake_index;
if (!atomic_read(&sbq->ws_active))
return NULL;
- wake_index = atomic_read(&sbq->wake_index);
+ old_wake_index = wake_index = atomic_read(&sbq->wake_index);
for (i = 0; i < SBQ_WAIT_QUEUES; i++) {
struct sbq_wait_state *ws = &sbq->ws[wake_index];
if (waitqueue_active(&ws->wait)) {
- if (wake_index != atomic_read(&sbq->wake_index))
- atomic_set(&sbq->wake_index, wake_index);
- return ws;
+ if (wake_index == old_wake_index)
+ return ws;
+ if (atomic_cmpxchg(&sbq->wake_index, old_wake_index,
+ wake_index) == old_wake_index)
+ return ws;
+ goto again;
wake_index = sbq_index_inc(wake_index);