Re: [PATCH] blk-mq: Properly init bios from blk_mq_alloc_request_hctx()

From: Ming Lei
Date: Tue Oct 25 2022 - 05:00:38 EST


On Tue, Oct 25, 2022 at 3:40 PM John Garry <john.garry@xxxxxxxxxx> wrote:
>
> On 25/10/2022 01:34, Ming Lei wrote:
> >>>> but sometimes we just need to allocate for a specific HW
> >>>> queue...
> >>>>
> >>>> For my usecase of interest, it should not impact if the cpumask of the HW
> >>>> queue goes offline after selecting the cpu in blk_mq_alloc_request_hctx(),
> >>>> so any race is ok ... I think.
> >>>>
> >>>> However it should be still possible to make blk_mq_alloc_request_hctx() more
> >>>> robust. How about using something like work_on_cpu_safe() to allocate and
> >>>> execute the request with blk_mq_alloc_request() on a cpu associated with the
> >>>> HW queue, such that we know the cpu is online and stays online until we
> >>>> execute it? Or also extent to work_on_cpumask_safe() variant, so that we
> >>>> don't need to try all cpus in the mask (to see if online)?
> >>> But all cpus on this hctx->cpumask could become offline.
> >> If all hctx->cpumask are offline then we should not allocate a request and
> >> this is acceptable. Maybe I am missing your point.
> > As you saw, this API has the above problem too, but any one of CPUs
> > may become online later, maybe just during blk_mq_alloc_request_hctx(),
> > and it is easy to cause inconsistence.
> >
> > You didn't share your use case, but for nvme connection request, if it
> > is 1:1 mapping, if any one of CPU becomes offline, the controller
> > initialization could be failed, that isn't good from user viewpoint at
> > all.
>
> My use case is in SCSI EH domain. For my HBA controller of interest, to
> abort an erroneous IO we must send a controller proprietary abort
> command on same HW queue as original command. So we would need to
> allocate this abort request for a specific HW queue.

IMO, it is one bad hw/sw interface.

First such request has to be reserved, since all inflight IOs can be in error.

Second error handling needs to provide forward-progress, and it is supposed
to not require external dependency, otherwise easy to cause deadlock, but
here request from specific HW queue just depends on this queue's cpumask.

Also if it has to be reserved, it can be done as one device/driver private
command, so why bother blk-mq for this special use case?

>
> I mentioned before that if no hctx->cpumask is online then we don't need
> to allocate a request. That is because if no hctx->cpumask is online,
> this means that original erroneous IO must be completed due to nature of
> how blk-mq cpu hotplug handler works, i.e. drained, and then we don't
> actually need to abort it any longer, so ok to not get a request.

No, it is really not OK, if all cpus in hctx->cpumask are offline, you
can't allocate
request on the specified hw queue, then the erroneous IO can't be handled,
then cpu hotplug handler may hang for ever.

Thanks,
Ming