Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts handling requests

From: Ming Lei
Date: Sat Jun 27 2015 - 12:08:28 EST


On Sat, Jun 27, 2015 at 1:14 AM, Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
> Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
>> 2015-06-26 0:40 GMT+09:00 Ming Lei <tom.leiming@xxxxxxxxx>:
>> > On Thu, 25 Jun 2015 21:49:43 +0900
>> > Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
>> >> For example, there is a single hw queue (hctx) and two CPU queues
>> >> (ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and
>> >> a request is inserted into ctx1->rq_list and set bit0 in pending
>> >> bitmap as ctx1->index_hw is still zero.
>> >>
>> >> And then while running hw queue, flush_busy_ctxs() finds bit0 is set
>> >> in pending bitmap and tries to retrieve requests in
>> >> hctx->ctxs[0].rq_list. But htx->ctxs[0] is ctx0, so the request in
>> >> ctx1->rq_list is ignored.
>> >
>> > Per current design, the request should have been inserted into ctx0 instead
>> > of ctx1 because ctx1 isn't mapped yet even though ctx1->cpu becomes ONLINE.
>> >
>> > So how about the following patch? which looks much simpler.
>>
>> OK, I'll try this patch to see if the problem disappears.
>
> This doesn't fix the problem. Because:
>
>> > diff --git a/block/blk-mq.c b/block/blk-mq.c
>> > index f537796..2f45b73 100644
>> > --- a/block/blk-mq.c
>> > +++ b/block/blk-mq.c
>> > @@ -1034,7 +1034,12 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue,
>> > struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx;
>> >
>> > current_ctx = blk_mq_get_ctx(q);
>> > - if (!cpu_online(ctx->cpu))
>> > + /*
>> > + * ctx->cpu may become ONLINE but ctx hasn't been mapped to
>> > + * hctx yet because there is a tiny race window between
>> > + * ctx->cpu ONLINE and doing the remap
>> > + */
>> > + if (!blk_mq_ctx_mapped(ctx))
>> > rq->mq_ctx = ctx = current_ctx;
>
> The process running on just onlined CPU1 in the above example can
> satisfy this condition and current_ctx will be ctx1. So the same
> scenario can happen (the request is ignored by flush_busy_ctxs).

Yeah, that is possible, and it should be bug of blk_mq_insert_request(),
because the function supposes that the current ctx is mapped.

Then I think the approach in your 1st email of this thread may be
good one, and looks we have to make the remapping during
CPU UP_PREPARE notifier.

>
> I found simple alternative solution that assigns the offline CPUs
> unique ctx->index_hw.

This solution looks simpler, but it may not be correct.

>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 594eea0..a8fcfbf 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1787,10 +1787,11 @@ static void blk_mq_map_swqueue(struct
> request_queue *q)
> */
> queue_for_each_ctx(q, ctx, i) {
> /* If the cpu isn't online, the cpu is mapped to first hctx */
> - if (!cpu_online(i))
> - continue;
> + if (!cpu_online(i) && cpu_possible(i))

The cpu_possible() check isn't needed.

> + hctx = q->mq_ops->map_queue(q, 0);
> + else
> + hctx = q->mq_ops->map_queue(q, i);

The above change supposes that all offline CPUs(ctxs)
share the same 'hctx' mapped from CPU 0, and that is
obvious wrong.

All offline CPUs should have shared the 1st 'hctx' instead
of the 'hctx' mapped from CPU 0.

>
> - hctx = q->mq_ops->map_queue(q, i);
> cpumask_set_cpu(i, hctx->cpumask);

CPU i shouldn't have been set on hctx->cpumask in this approach
if it isn't online.

> ctx->index_hw = hctx->nr_ctx;
> hctx->ctxs[hctx->nr_ctx++] = ctx;

I am not sure the current code is ready about adding offline 'ctx' into
'hctx', and there are some observalbe effects at least:

- blk_mq_run_hw_queue() can run even the hctx hasn't mapped
'ctx'
- the offline ctx kobject will be exposed to user space in sysfs
- blk_mq_hw_queue_mapped() may becomes always true
- set->tags[i](request entries) may not be freed even if there
aren't mapped 'ctx' in one 'hctx'

--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/