Re: [PATCH 4/4] block: fix fix ordering between checking QUEUE_FLAG_QUIESCED and adding requests to hctx->dispatch

From: Ming Lei
Date: Fri Aug 23 2024 - 07:28:20 EST


On Sun, Aug 11, 2024 at 06:19:21PM +0800, Muchun Song wrote:
> Supposing the following scenario.
>
> CPU0 CPU1
>
> blk_mq_request_issue_directly() blk_mq_unquiesce_queue()
> if (blk_queue_quiesced()) blk_queue_flag_clear(QUEUE_FLAG_QUIESCED) 3) store
> blk_mq_insert_request() blk_mq_run_hw_queues()
> /* blk_mq_run_hw_queue()
> * Add request to dispatch list or set bitmap of if (!blk_mq_hctx_has_pending()) 4) load
> * software queue. 1) store return
> */
> blk_mq_run_hw_queue()
> if (blk_queue_quiesced()) 2) load
> return
> blk_mq_sched_dispatch_requests()
>
> The full memory barrier should be inserted between 1) and 2), as well as
> between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is
> cleared or CPU1 sees dispatch list or setting of bitmap of software queue.
> Otherwise, either CPU will not re-run the hardware queue causing starvation.

Memory barrier shouldn't serve as bug fix for two slow code paths.

One simple fix is to add helper of blk_queue_quiesced_lock(), and
call the following check on CPU0:

if (blk_queue_quiesced_lock())
blk_mq_run_hw_queue();


thanks,
Ming