RE: SCSI low level driver: how to prevent I/O upon hibernation?

From: Dexuan Cui
Date: Tue Apr 14 2020 - 13:10:13 EST


> From: linux-hyperv-owner@xxxxxxxxxxxxxxx
> <linux-hyperv-owner@xxxxxxxxxxxxxxx> On Behalf Of Dexuan Cui
> Sent: Saturday, April 11, 2020 10:20 AM
> To: Bart Van Assche <bvanassche@xxxxxxx>; Ming Lei
> <tom.leiming@xxxxxxxxx>; Martin K. Petersen <martin.petersen@xxxxxxxxxx>
> Cc: linux-scsi@xxxxxxxxxxxxxxx; Christoph Hellwig <hch@xxxxxx>;
> linux-hyperv@xxxxxxxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; vkuznets
> <vkuznets@xxxxxxxxxx>; Michael Kelley <mikelley@xxxxxxxxxxxxx>; KY
> Srinivasan <kys@xxxxxxxxxxxxx>; Olaf Hering <olaf@xxxxxxxxx>; Stephen
> Hemminger <sthemmin@xxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
> Subject: RE: SCSI low level driver: how to prevent I/O upon hibernation?
>
> > From: Bart Van Assche <bvanassche@xxxxxxx>
> > Sent: Saturday, April 11, 2020 8:03 AM
> >
> > On 2020-04-10 23:01, Dexuan Cui wrote:
> > > Please let me know if the change to scsi_device_set_state() is OK.
> >
> > Hadn't Ming Lei already root-caused this issue for you? From his e-mail:
> > "So you can't free related vmbus ringbuffer cause BLK_MQ_REQ_PREEMPT
> > request is still to be handled."
> >
> > Please follow that advice.
> >
> > Bart.
>
> Hi Bart, Ming,
> I agree Ming has root-caused the issue, but it looks the advice can not
> apply to the hibernation scenario. :-)
>
> Sorry for my lack of knowledge of the complex SCSI subsystems -- could
> you please elaborate on what a low level SCSI device driver (like hv_storvsc)
> should do to safely save/restore the device state upon hibernation?
>
> The nature of "free related vmbus ringbuffer" in hv_storvsc is that: the
> driver can not handle any I/O after the device is quiesced in
> software_resume() -> load_image_and_restore() -> hibernation_restore()
> -> dpm_suspend_start() -> ... -> storvsc_suspend(). BTW, after the SCSI
> device is quiesced, the hibernation's resume path also quiesces other
> devices, disables non-boot CPUs, and finally jumps to the old kernel's
> entry point where the old kernel was suspended, and the old kernel will
> resume back.
>
> My intuition is that the upper level SCSI layer should provide an API to
> flush any pending I/O and block any new I/O after a SCSI device is
> "quiesced"? -- it looks scsi_host_block()/scsi_host_unblock() are such
> APIs, which are already used by
> drivers/scsi/aacraid/linit.c: aac_suspend()/aac_resume().
>
> That's why I proposed the patch of the same thing for hv_storvsc, and
> it looks the patch works for me: without the patch I can easily hit the
> panic I reported in the first email; with the patch, I have successfully
> done more than 30 rounds of hibernation without the panic.
>
> However, it looks you implied my intuition is wrong and it's *expected*
> that the upper level SCSI layer can still submit I/O requests with the
> BLK_MQ_REQ_PREEMPT flag after the SCSI device is "quiesced"?
>
> If this is the case, then how is hv_storvsc supposed to handle the I/O
> after the SCSI device is quiesced? I can keep the related vmbus ringbuffer,
> but the real issue is: the driver is unable to handle any I/O at all since the
> vmbus connection to the Hyper-V host is disconnected soon, after
> the SCSI device is quiesced. Should hv_storvsc return an error for such I/O,
> or block such I/O until the SCSI device is resumed? -- These don't look
> good to me, and I really think the upper level SCSI layer should provide
> an API to block any new I/O after a SCSI device is "quiesced" -- again, can
> you please clarify if scsi_host_block()/scsi_host_unblock() are such APIs?
>
> Looking forward to your replies!
>
> Thanks,
> -- Dexuan

It looks scsi_host_block() and scsi_host_unblock() are just the APIs
I need. I have run my hibernation test with the APIs more than 1000 rounds
and the VM is still running fine without the panic I reported previously when
I didn't use the APIs.

FYI: the aacraid driver is the first user of the APIs:

commit 3d3ca53b163914c1397289d0c2ee6d2f52362dcc
Author: Hannes Reinecke <hare@xxxxxxx>
Date: Fri Feb 28 08:53:14 2020 +0100

scsi: aacraid: use scsi_host_(block,unblock) to block I/O

Use scsi_host_block() and scsi_host_unblock() instead of
scsi_block_requests()/scsi_unblock_requests() to block and unblock I/O.
This has the advantage that the block layer will stop sending I/O to the
adapter instead of having the SCSI midlayer requeueing I/O internally.

Link: https://lore.kernel.org/r/20200228075318.91255-10-hare@xxxxxxx
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Acked-by: Balsundar P < Balsundar.P@xxxxxxxxxxxxx>
Signed-off-by: Hannes Reinecke <hare@xxxxxxx>
Signed-off-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>

BTW, I suspect aac_suspend() -> scsi_host_block() is not relly working
because scsi_host_block() doesn't allow a state transition from
SDEV_QUIESCE to SDEV_BLOCK and returns -EINVAL for that. aac_suspend()
should check the return value of scsi_host_block(), and my change to
scsi_device_set_state() should be needed to avoid the -EINVAL error.

If there is no objection, I plan to send some patches later.

Thanks,
-- Dexuan