Re: [PATCH v2 0/3] Fix some starvation problems in block layer

From: Muchun Song
Date: Mon Sep 09 2024 - 22:50:25 EST




> On Sep 3, 2024, at 16:16, Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote:
>
> We encounter a problem on our servers where hundreds of UNINTERRUPTED
> processes are all waiting in the WBT wait queue. And the IO hung detector
> logged so many messages about "blocked for more than 122 seconds". The
> call trace is as follows:
>
> Call Trace:
> __schedule+0x959/0xee0
> schedule+0x40/0xb0
> io_schedule+0x12/0x40
> rq_qos_wait+0xaf/0x140
> wbt_wait+0x92/0xc0
> __rq_qos_throttle+0x20/0x30
> blk_mq_make_request+0x12a/0x5c0
> generic_make_request_nocheck+0x172/0x3f0
> submit_bio+0x42/0x1c0
> ...
>
> The WBT module is used to throttle buffered writeback, which will block
> any buffered writeback IO request until the previous inflight IOs have
> been completed. So I checked the inflight IO counter. That was one meaning
> one IO request was submitted to the downstream interface like block core
> layer or device driver (virtio_blk driver in our case). We need to figure
> out why the inflight IO is not completed in time. I confirmed that all
> the virtio ring buffers of virtio_blk are empty and the hardware dispatch
> list had one IO request, so the root cause is not related to the block
> device or the virtio_blk driver since the driver has never received that
> IO request.
>
> We know that block core layer could submit IO requests to the driver through
> kworker (the callback function is blk_mq_run_work_fn). I thought maybe the
> kworker was blocked by some other resources causing the callback to not be
> evoked in time. So I checked all the kworkers and workqueues and confirmed
> there was no pending work on any kworker or workqueue.
>
> Integrate all the investigation information, the problem should be in the
> block core layer missing a chance to submit that IO request. After
> some investigation of code, I found some scenarios which could cause the
> problem.

Hi Jens Axboe,

May I ask if you have any suggestions for those fixes? Or if they could
be merged?

Muchun,
Thanks.

>
> Changes in v2:
> - Collect RB tag from Ming Lei.
> - Use barrier-less approach to fix QUEUE_FLAG_QUIESCED ordering problem
> suggested by Ming Lei.
> - Apply new approach to fix BLK_MQ_S_STOPPED ordering for easier maintenance.
> - Add Fixes tag to each patch.
>
> Muchun Song (3):
> block: fix missing dispatching request when queue is started or
> unquiesced
> block: fix ordering between checking QUEUE_FLAG_QUIESCED and adding
> requests
> block: fix ordering between checking BLK_MQ_S_STOPPED and adding
> requests
>
> block/blk-mq.c | 55 ++++++++++++++++++++++++++++++++++++++------------
> block/blk-mq.h | 13 ++++++++++++
> 2 files changed, 55 insertions(+), 13 deletions(-)
>
> --
> 2.20.1
>