Re: [PATCH] null_blk: allow teardown on request timeout
From: Ming Lei
Date: Mon Oct 17 2022 - 06:18:57 EST
On Mon, Oct 17, 2022 at 10:04:26AM +0000, Chaitanya Kulkarni wrote:
> On 10/17/22 02:50, Ming Lei wrote:
> > On Mon, Oct 17, 2022 at 09:30:47AM +0000, Chaitanya Kulkarni wrote:
> >>
> >>>> + /*
> >>>> + * Unblock any pending dispatch I/Os before we destroy the device.
> >>>> + * From null_destroy_dev()->del_gendisk() will set GD_DEAD flag
> >>>> + * causing any new I/O from __bio_queue_enter() to fail with -ENODEV.
> >>>> + */
> >>>> + blk_mq_unquiesce_queue(nullb->q);
> >>>> +
> >>>> + null_destroy_dev(nullb);
> >>>
> >>> destroying device is never good cleanup for handling timeout/abort, and it
> >>> should have been the last straw any time.
> >>>
> >>
> >> That is exactly why I've added the rq_abort_limit, so until the limit
> >> is not reached null_abort_work() will not get scheduled and device is
> >> not destroyed.
> >
> > I meant destroying device should only be done iff the normal abort handler
> > can't recover the device, however, your patch simply destroys device
> > without running any abort handling.
> >
>
> I did not understand your comment, can you please elaborate on exactly
> where and which abort handlers needs to be called in this patch before
> null_destroy_nullb() ?
In case of request timeout, there may be something wrong which needs
to be recovered.
>
> the objective of this patch it to simulate the teardown scenario
> from timeout handler so it can get tested on regular basis with
> null_blk ...
Why does teardown scenario have to be triggered for timeout? That
looks you think teardown & destroying device for timeout is one normal
and common way, but I think it is not, the device shouldn't be removed
if it still can work. I have got such kind of complaints of disk
disappeared just by request timeout, such as, nvme-pci.
thanks,
Ming