Re: [PATCH] blk-mq: Put driver tag in blk_mq_dispatch_rq_list() when no budget

From: Doug Anderson
Date: Thu Apr 23 2020 - 18:42:54 EST


Hi,

On Mon, Apr 20, 2020 at 1:23 AM John Garry <john.garry@xxxxxxxxxx> wrote:
>
> On 18/04/2020 03:43, Bart Van Assche wrote:
> > On 2020-04-16 04:18, John Garry wrote:
> >> If in blk_mq_dispatch_rq_list() we find no budget, then we break of the
> >> dispatch loop, but the request may keep the driver tag, evaulated
> >> in 'nxt' in the previous loop iteration.
> >>
> >> Fix by putting the driver tag for that request.
> >>
> >> Signed-off-by: John Garry <john.garry@xxxxxxxxxx>
> >>
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index 8e56884fd2e9..a7785df2c944 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -1222,8 +1222,10 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
> >> rq = list_first_entry(list, struct request, queuelist);
> >>
> >> hctx = rq->mq_hctx;
> >> - if (!got_budget && !blk_mq_get_dispatch_budget(hctx))
> >> + if (!got_budget && !blk_mq_get_dispatch_budget(hctx)) {
> >> + blk_mq_put_driver_tag(rq);
> >> break;
> >> + }
> >>
> >> if (!blk_mq_get_driver_tag(rq)) {
> >> /*
> >
> > Is this something that can only happen if q->mq_ops->queue_rq(hctx, &bd)
> > returns another value than BLK_STS_OK, BLK_STS_RESOURCE and
> > BLK_STS_DEV_RESOURCE?
>
> Right, as that case is handled in blk_mq_handle_dev_resource()
>
> If so, please add a comment in the source code
> > that explains this.
>
> So important that we should now do this in an extra patch?
>
> >
> > Is this perhaps a bug fix for 0bca799b9280 ("blk-mq: order getting
> > budget and driver tag")? If so, please mention this and add Cc tags for
> > the people who were Cc-ed on that patch.
>
> So it looks like 0bca799b9280 had a flaw, but I am not sure if anything
> got broken there and worthy of stable backport.
>
> I found this issue while debugging Ming's blk-mq cpu hotplug patchset,
> which I feel is ready to merge.
>
> Having said that, this nasty issue did take > 1 day for me to debug...
> so let me know.

As per the above conversation, presumably this should go to stable
then for any kernel that has commit 0bca799b9280 ("blk-mq: order
getting budget and driver tag")? For instance, I think 4.19 would be
affected? When I picked it there I got a conflict due to not having
commit ea4f995ee8b8 ("blk-mq: cache request hardware queue mapping")
but I think it's just a context collision and easy to resolve.

I'm no expert in the block code, but I posted my backport to 4.19 at
<https://crrev.com/c/2163313>. I'm happy to send an email as a patch
to the list too or double-check that someone else's conflict
resolution matches mine.

-Doug