Re: [PATCH v5] sbitmap: fix io hung due to race on sbitmap_word::cleared

From: YangYang
Date: Mon Jul 08 2024 - 03:55:00 EST


On 2024/7/5 18:58, Pavel Begunkov wrote:
On 7/3/24 13:28, YangYang wrote:
On 2024/7/3 19:55, Pavel Begunkov wrote:
On 7/3/24 03:28, Yang Yang wrote:
Configuration for sbq:
   depth=64, wake_batch=6, shift=6, map_nr=1

1. There are 64 requests in progress:
   map->word = 0xFFFFFFFFFFFFFFFF
2. After all the 64 requests complete, and no more requests come:
   map->word = 0xFFFFFFFFFFFFFFFF, map->cleared = 0xFFFFFFFFFFFFFFFF
3. Now two tasks try to allocate requests:
   T1:                                       T2:
   __blk_mq_get_tag                          .
   __sbitmap_queue_get                       .
   sbitmap_get                               .
   sbitmap_find_bit                          .
   sbitmap_find_bit_in_word                  .
   __sbitmap_get_word  -> nr=-1              __blk_mq_get_tag
   sbitmap_deferred_clear                    __sbitmap_queue_get
   /* map->cleared=0xFFFFFFFFFFFFFFFF */     sbitmap_find_bit
     if (!READ_ONCE(map->cleared))           sbitmap_find_bit_in_word
       return false;                         __sbitmap_get_word -> nr=-1
     mask = xchg(&map->cleared, 0)           sbitmap_deferred_clear
     atomic_long_andnot()                    /* map->cleared=0 */
                                               if (!(map->cleared))
                                                 return false;
                                      /*
                                       * map->cleared is cleared by T1
                                       * T2 fail to acquire the tag
                                       */

4. T2 is the sole tag waiter. When T1 puts the tag, T2 cannot be woken
up due to the wake_batch being set at 6. If no more requests come, T1
will wait here indefinitely.

This patch achieves two purposes:
1. Check on ->cleared and update on both ->cleared and ->word need to
be done atomically, and using spinlock could be the simplest solution.
So revert commit 661d4f55a794 ("sbitmap: remove swap_lock"), which
may cause potential race.

2. Add extra check in sbitmap_deferred_clear(), to identify whether
->word has free bits.

Fixes: 661d4f55a794 ("sbitmap: remove swap_lock")

Is it blamed right? Considering that the revert alone doesn't fix
the problem, it sounds like the 2nd step might need to be ported
to kernels even without the blamed commit.


Got it. How would you feel about removing
"Fixes: 661d4f55a794 ("sbitmap: remove swap_lock")" and modifying
the commit message as follows?

We need a fixes tag, which commit caused it? Can it happen prior
to ea86ea2cdced ("sbitmap: ammortize cost of clearing bits")?

The initial introduction of the sbitmap_deferred_clear function also
introduced a potential race condition.
So the fixes tag should be:
Fixes: ea86ea2cdced ("sbitmap: ammortize cost of clearing bits")

Thanks.


This patch achieves two purposes:
1. Check on ->cleared and update on both ->cleared and ->word need to
be done atomically, and using spinlock could be the simplest solution.
2. Add extra check in sbitmap_deferred_clear(), to identify whether
->word has free bits.

Thanks.