Re: [PATCH 1/1] block: System crashes when cpu hotplug + bouncing port

From: Ming Lei
Date: Mon Jun 28 2021 - 05:59:55 EST


On Mon, Jun 28, 2021 at 11:07:03AM +0200, Daniel Wagner wrote:
> Hi Wen,
>
> On Sun, Jun 27, 2021 at 10:14:32PM -0500, wenxiong@xxxxxxxxxxxxxxxxxx wrote:
> > @@ -468,8 +467,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> > data.hctx = q->queue_hw_ctx[hctx_idx];
> > if (!blk_mq_hw_queue_mapped(data.hctx))
> > goto out_queue_exit;
> > - cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
> > - data.ctx = __blk_mq_get_ctx(q, cpu);
> > + data.ctx = __blk_mq_get_ctx(q, hctx_idx);
>
> hctx_idx is just an index, not a CPU id. In this scenario, the hctx_idx
> used to lookup the context happens to be valid. I am still a bit
> confused why [1] doesn't work for this scenario.

[1] is fine from blk-mq viewpoint, but nvme needs to improve the
failure handling, otherwise no io queues may be connected in the
worst case.

>
> As Ming pointed out in [2] we need to update cpumask for CPU hotplug

I mention there is still hole with your patch, not mean we need to
update cpumask.

The root cause is that blk-mq doesn't work well on tag allocation from
specified hctx(blk_mq_alloc_request_hctx), and blk-mq assumes that any
request allocation can't cross hctx inactive/offline, see blk_mq_hctx_notify_offline()
and blk_mq_get_tag(). Either the allocated request is completed or new
allocation is prevented before the current hctx becomes inactive(any CPU in
hctx->cpumask is offline).

I tried[1] to move connecting io queue into driver and kill blk_mq_alloc_request_hctx()
for addressing this issue, but there is corner case(timeout) not covered.

I understand that NVMe's requirement is that connect io queue should be
done successfully no matter if the hctx is inactive or not. Sagi,
connect me if I am wrong.


[1]
https://lore.kernel.org/linux-block/fda43a50-a484-dde7-84a1-94ccf9346bdd@xxxxxxxxxxxx/T/#m1e902f69e8503f5e6202945b8b79e5b7252e3689

Thanks,
Ming