Re: [PATCH] blk-mq: Properly init bios from blk_mq_alloc_request_hctx()

From: Ming Lei
Date: Tue Oct 25 2022 - 05:22:26 EST


On Tue, Oct 25, 2022 at 10:08:10AM +0100, John Garry wrote:
> On 25/10/2022 10:00, Ming Lei wrote:
> > > My use case is in SCSI EH domain. For my HBA controller of interest, to
> > > abort an erroneous IO we must send a controller proprietary abort
> > > command on same HW queue as original command. So we would need to
> > > allocate this abort request for a specific HW queue.
> > IMO, it is one bad hw/sw interface.
> >
> > First such request has to be reserved, since all inflight IOs can be in error.
>
> Right
>
> >
> > Second error handling needs to provide forward-progress, and it is supposed
> > to not require external dependency, otherwise easy to cause deadlock, but
> > here request from specific HW queue just depends on this queue's cpumask.
> >
> > Also if it has to be reserved, it can be done as one device/driver private
> > command, so why bother blk-mq for this special use case?
>
> I have a series for reserved request support, which I will send later.
> Please have a look. And as I mentioned, I would prob not end up using
> blk_mq_alloc_request_hctx() anyway.
>
> >
> > > I mentioned before that if no hctx->cpumask is online then we don't need
> > > to allocate a request. That is because if no hctx->cpumask is online,
> > > this means that original erroneous IO must be completed due to nature of
> > > how blk-mq cpu hotplug handler works, i.e. drained, and then we don't
> > > actually need to abort it any longer, so ok to not get a request.
> > No, it is really not OK, if all cpus in hctx->cpumask are offline, you
> > can't allocate
> > request on the specified hw queue, then the erroneous IO can't be handled,
> > then cpu hotplug handler may hang for ever.
>
> If the erroneous IO is still in-flight from blk-mq perspective, then how can
> hctx->cpumask still be offline? I thought that we guarantee that
> hctx->cpumask cannot go offline until drained.

Yeah, the draining is done before the cpu is offline. But the drain is
simply waiting for the inflight IO to be completed. If the IO is failed
during the waiting, you can't allocate such reserved request for error
handling, then hang ever in blk_mq_hctx_notify_offline().

If you just make it one driver private command, there can't be such
issue. Block layer is supposed for handling common case(normal io and pt io),
I'd suggest to not put such special cases into block layer.

thanks,
Ming