Re: [PATCH] scsi: storvsc: Fix a panic in the hibernation procedure

From: Bart Van Assche
Date: Wed Apr 22 2020 - 15:56:10 EST


On 4/21/20 11:24 PM, Dexuan Cui wrote:
Upon suspend, I suppose the other LLDs can not accept I/O either, then
what do they do when some kernel thread still tries to submit I/O? Do
they "block" the I/O until resume, or just return an error immediately?

This is my understanding of how other SCSI LLDs handle suspend/resume:
- The ULP driver, e.g. the SCSI sd driver, implements power management support by providing callbacks in struct scsi_driver.gendrv.pm and also in scsi_bus_type.pm. The SCSI sd suspend callbacks flush the device cache and send a STOP command to the device.
- SCSI LLDs for PCIe devices optionally provide pci_driver.suspend and resume callbacks. These callbacks can be used to make the PCIe device enter/leave a power saving state. No new SCSI commands should be submitted after pci_driver.suspend has been called.

I had a look at drivers/scsi/xen-scsifront.c. It looks this LLD implements
a mechanism of marking the device busy upon suspend, so any I/O will
immediately get an error SCSI_MLQUEUE_HOST_BUSY. IMO the
disadvantage is: the mechanism of marking the device busy looks
complex, and the hot path .queuecommand() has to take the
spinlock shost->host_lock, which should affect the performance.

I think the code you are referring to is the code in scsifront_suspend(). A pointer to that function is stored in a struct xenbus_driver instance. That's another structure than the structures mentioned above.

Wouldn't it be better to make sure that any hypervisor suspend operations happen after it is guaranteed that no further SCSI commands will be submitted such that hypervisor suspend operations do not have to deal with SCSI commands submitted during or after the hypervisor suspend callback?

It looks drivers/scsi/nsp32.c: nsp32_suspend() and
drivers/scsi/3w-9xxx.c: twa_suspend() do nothing to handle new I/O
after suspend. I doubt this is correct.

nsp32_suspend() is a PCI suspend callback. If any SCSI commands would be submitted after that callback has started that would mean that the SCSI suspend and PCIe suspend operations are called in the wrong order. I do not agree that code for suspending SCSI commands should be added in nsp32_suspend().

So it looks to me there is no simple mechanism to handle the scenario
here, and I guess that's why the scsi_host_block/unblock APIs are
introduced, and actually there is already an user of the APIs:
3d3ca53b1639 ("scsi: aacraid: use scsi_host_(block,unblock) to block I/O").

The aacraid patch says: "This has the advantage that the block layer will
stop sending I/O to the adapter instead of having the SCSI midlayer
requeueing I/O internally". It looks this may imply that using the new
APIs is encouraged?

I'm fine with using these new functions in device reset handlers. Using these functions in power management handlers seems wrong to me.

PS, here storvsc has to destroy and re-construct the I/O queues: the
I/O queues are shared memory ringbufers between the guest and the
host; in the resume path of the hibernation procedure, the memory
pages allocated by the 'new' kernel is different from that allocated by
the 'old' kernel, so before jumping to the 'old' kernel, the 'new' kernel
must destroy the mapping of the pages, and later after we jump to
the 'old' kernel, we'll re-create the mapping using the pages allocated
by the 'old' kernel. Here "create the mapping" means the guest tells
the host about the physical addresses of the pages.

Thank you for having clarified this. This helps me to understand the HV driver framework better. I think this means that the hv_driver.suspend function should be called at a later time than SCSI suspend. From Documentation/driver-api/device_link.rst: "By default, the driver core only enforces dependencies between devices that are borne out of a parent/child relationship within the device hierarchy: When suspending, resuming or shutting down the system, devices are ordered based on this relationship, i.e. children are always suspended before their parent, and the parent is always resumed before its children." Is there a single storvsc_drv instance for all SCSI devices supported by storvsc_drv? Has it been considered to make storvsc_drv the parent device of all SCSI devices created by the storvsc driver?

Thanks,

Bart.