Re: [PATCH next] sbitmap: fix lockup while swapping

From: Hugh Dickins
Date: Fri Sep 23 2022 - 17:30:00 EST


On Fri, 23 Sep 2022, Keith Busch wrote:

> Does the following fix the observation? Rational being that there's no reason
> to spin on the current wait state that is already under handling; let
> subsequent clearings proceed to the next inevitable wait state immediately.

It's running fine without lockup so far; but doesn't this change merely
narrow the window? If this is interrupted in between atomic_try_cmpxchg()
setting wait_cnt to 0 and sbq_index_atomic_inc() advancing wake_index,
don't we run the same risk as before, of sbitmap_queue_wake_up() from
the interrupt handler getting stuck on that wait_cnt 0?

>
> ---
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index 624fa7f118d1..47bf7882210b 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -634,6 +634,13 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq, int *nr)
>
> *nr -= sub;
>
> + /*
> + * Increase wake_index before updating wait_cnt, otherwise concurrent
> + * callers can see valid wait_cnt in old waitqueue, which can cause
> + * invalid wakeup on the old waitqueue.
> + */
> + sbq_index_atomic_inc(&sbq->wake_index);
> +
> /*
> * When wait_cnt == 0, we have to be particularly careful as we are
> * responsible to reset wait_cnt regardless whether we've actually
> @@ -660,13 +667,6 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq, int *nr)
> * of atomic_set().
> */
> smp_mb__before_atomic();
> -
> - /*
> - * Increase wake_index before updating wait_cnt, otherwise concurrent
> - * callers can see valid wait_cnt in old waitqueue, which can cause
> - * invalid wakeup on the old waitqueue.
> - */
> - sbq_index_atomic_inc(&sbq->wake_index);
> atomic_set(&ws->wait_cnt, wake_batch);
>
> return ret || *nr;
> --