Re: [PATCH] block/mq-deadline: Fix WARN when set async_depth by sysfs

From: Zhiguo Niu
Date: Tue Apr 02 2024 - 01:44:49 EST


On Tue, Apr 2, 2024 at 5:23 AM Bart Van Assche <bvanassche@xxxxxxx> wrote:
>
> On 3/31/24 17:58, Zhiguo Niu wrote:
> > On Sat, Mar 30, 2024 at 2:08 AM Bart Van Assche <bvanassche@acmorg> wrote:
> >>
> >> On 3/28/24 7:44 PM, Zhiguo Niu wrote:
> >>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> >>> index 02a916b..89c516e 100644
> >>> --- a/block/mq-deadline.c
> >>> +++ b/block/mq-deadline.c
> >>> @@ -646,10 +646,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
> >>> struct request_queue *q = hctx->queue;
> >>> struct deadline_data *dd = q->elevator->elevator_data;
> >>> struct blk_mq_tags *tags = hctx->sched_tags;
> >>> + unsigned int shift = tags->bitmap_tags.sb.shift;
> >>> + unsigned int dd_min_depth = max(1U, 3 * (1U << shift) / 4);
> >>>
> >>> dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
> >>>
> >>> - sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
> >>> + sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd_min_depth);
> >>> }
> >>
> >> The above patch sets min_shallow_depth to the same value as commit
> >> d47f9717e5cf ("block/mq-deadline: use correct way to throttling write
> >> requests"). That commit got reverted because it was causing performance
> >> problems. So the above patch reintroduces the performance problem that
> >> has been fixed by commit 256aab46e316 ("Revert "block/mq-deadline: use
> >> correct way to throttling write requests"").
> > Hi Bart Van Assche,
> >
> > This patch only modifies the initial minimum value of
> > min_shallow_depth and does not change "dd->async_depth",
> > so it will not cause performance problems like the previous patch
> > (d47f9717e5cf ("block/mq-deadline: use correct way to throttling write
> > requests")).
>
Hi Bart Van Assche,
thanks for your reply.

> Oops, I misread your patch. After having taken another look, my
> conclusions are as follows:
> * sbitmap_queue_min_shallow_depth() is called. This causes
> sbq->wake_batch to be modified but I don't think that it is a proper
> fix for dd_limit_depth().
yes, it will affect sbq->wake_batch, But judging from the following code:
wake_batch = clamp_t(unsigned int, depth / SBQ_WAIT_QUEUES, 1, SBQ_WAKE_BATCH);
wake_batch will be between 1~8. and my experiment result is:
1. hw conditions: 8 cpus,emmc flash, one hw queue, and
queue_depth=64, sched_tag/nr_request=128, init dd->async_depth=96
--before this patch, get sched_tag infor from
sys/kernel/debug/block/mmcblk0/hctx0
---depth=128
---bits_per_word=32
---map_nr=4
---wake_batch=8
---min_shallow_depth=96
--after this patch, get sched_tag infor from
sys/kernel/debug/block/mmcblk0/hctx0
---bits_per_word=32
---map_nr=4
---wake_batch=8
---min_shallow_depth=24
wake_batch not changed.

2. hw conditions: 8 cpus,ufs flash, one hw queue, and queue_depth=31,
sched_tag/nr_request=62, init dd->async_depth=46
--before this patch, get sched_tag infor from sys/kernel/debug/block/sda/hctx0
---depth=62
---bits_per_word=8
---map_nr=8
---wake_batch=7
---min_shallow_depth=46
--after this patch, get sched_tag infor from sys/kernel/debug/block/sda/hctx0
---bits_per_word=8
---map_nr=8
---wake_batch=6
---min_shallow_depth=6
wake_batch changed from 7 to 6, so it seems the patch have little
impact on wake_batch?

> * dd_limit_depth() still assigns a number in the range 1..nr_requests to
> data->shallow_depth while a number in the range 1..(1<<bt->sb.shift)
> should be assigned.
yes, In order to avoid the performance regression problem that Harshit
Mogalapalli reported, this patch will not directly modify
dd->async_depth,
but user can modify dd->async_depth from sysfs according to user's
request. which will modify data->shallow_depth after user modify it by
sysfs.
>
> > So what are your suggestions for fixing the warning shown in commit
> > msg if dd->async_depth is set by the user from sysfs?
> > thanks
>
> How about the two untested patches below?
>
> Thanks,
>
> Bart.
>
>
> Subject: [PATCH 1/2] block: Call .limit_depth() after .hctx has been set
>
> Prepare for using .hctx in dd_limit_depth().
>
> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
> ---
> block/blk-mq.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 34060d885c5a..d0db9252bb71 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -435,6 +435,7 @@ __blk_mq_alloc_requests_batch(struct
> blk_mq_alloc_data *data)
> static struct request *__blk_mq_alloc_requests(struct
> blk_mq_alloc_data *data)
> {
> struct request_queue *q = data->q;
> + struct elevator_mq_ops *ops = NULL;
> u64 alloc_time_ns = 0;
> struct request *rq;
> unsigned int tag;
> @@ -459,13 +460,11 @@ static struct request
> *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
> */
> if ((data->cmd_flags & REQ_OP_MASK) != REQ_OP_FLUSH &&
> !blk_op_is_passthrough(data->cmd_flags)) {
> - struct elevator_mq_ops *ops = &q->elevator->type->ops;
> + ops = &q->elevator->type->ops;
>
> WARN_ON_ONCE(data->flags & BLK_MQ_REQ_RESERVED);
>
> data->rq_flags |= RQF_USE_SCHED;
> - if (ops->limit_depth)
> - ops->limit_depth(data->cmd_flags, data);
> }
> }
>
> @@ -478,6 +477,9 @@ static struct request
> *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
> if (data->flags & BLK_MQ_REQ_RESERVED)
> data->rq_flags |= RQF_RESV;
>
> + if (ops && ops->limit_depth)
> + ops->limit_depth(data->cmd_flags, data);
> +
> /*
> * Try batched alloc if we want more than 1 tag.
> */
>
>
>
I think this part is OK .
> Subject: [PATCH 2/2] block/mq-deadline: Fix the tag reservation code
>
> Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags
> for synchronous requests")
> ---
> block/mq-deadline.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 02a916ba62ee..8e780069d91b 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -621,6 +621,20 @@ static struct request *dd_dispatch_request(struct
> blk_mq_hw_ctx *hctx)
> return rq;
> }
>
> +/*
> + * 'depth' is a number in the range 0..q->nr_requests. Convert it to a
> number
> + * in the range 0..(1 << bt->sb.shift) since that is the range expected by
> + * sbitmap_get_shallow().
> + */
> +static int dd_to_word_depth(struct blk_mq_hw_ctx *hctx, unsigned int
> qdepth)
> +{
> + struct sbitmap_queue *bt = &hctx->sched_tags->bitmap_tags;
> + const unsigned int nrr = hctx->queue->nr_requests;
> +
> + return max(((qdepth << bt->sb.shift) + nrr - 1) / nrr,
> + bt->min_shallow_depth);
will this still cause the performance regression mentioned by Harshit
Mogalapalli?
this will change data->shallow_depth value that is less than
dd->async_depth " dd->async_depth = max(1UL, 3 * q->nr_requests /
4);".
> +}
> +
> /*
> * Called by __blk_mq_alloc_request(). The shallow_depth value set by this
> * function is used by __blk_mq_get_tag().
> @@ -637,7 +651,7 @@ static void dd_limit_depth(blk_opf_t opf, struct
> blk_mq_alloc_data *data)
> * Throttle asynchronous requests and writes such that these requests
> * do not block the allocation of synchronous requests.
> */
> - data->shallow_depth = dd->async_depth;
> + data->shallow_depth = dd_to_word_depth(data->hctx, dd->async_depth);
> }
>
> /* Called by blk_mq_update_nr_requests(). */
> @@ -649,7 +663,7 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
>
> dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
>
> - sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
> + sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
this will also change min_shallow_depth and affect wake_batch?

My personal opinion is to keep the current dd->aync_depth unchanged to
avoid causing performance regression,
but it should allow users to set it by sysfs, and the WARN mentioned
best to be solved.
and just only change this part?
- sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
+ sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
thanks!
> }
>
> /* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
>