SCSI low level driver: how to prevent I/O upon hibernation?

From: Dexuan Cui
Date: Fri Apr 10 2020 - 01:45:02 EST


Hi all,
Can you please recommend the standard way to prevent the upper layer SCSI
driver from submitting new I/O requests when the system is doing hibernation?

Actually I already asked the question on 5/30 last year:
https://marc.info/?l=linux-scsi&m=155918927116283&w=2
and I thought all the sdevs are suspended and resumed automatically in
drivers/scsi/scsi_pm.c, and the low level SCSI adapter driver (i.e. hv_storvsc)
only needs to suspend/resume the state of the adapter itself. However, it looks
this is not true, because today I got such a panic in a v5.6 Linux VM running on
Hyper-V: the 'suspend' part of the hibernation process finished without any
issue, but when the VM was trying to resume back from the 'new' kernel to the
'old' kernel, these events happened:

1. the new kernel loaded the saved state from disk to memory.

2. the new kernel quiesced the devices, including the SCSI DVD device
controlled by the hv_storvsc low level SCSI driver, i.e.
drivers/scsi/storvsc_drv.c: storvsc_suspend() was called and the related vmbus
ringbuffer was freed.

3. However, disk_events_workfn() -> ... -> cdrom_check_events() -> ...
-> scsi_queue_rq() -> ... -> storvsc_queuecommand() was still trying to
submit I/O commands to the freed vmbus ringbuffer, and as a result, a NULL
pointer dereference panic happened.

Can anyone please explain the symptom?

I made the below patch, with which it seems I can no longer reproduce the panic.

But I don't know how scsi_block_requests() can reliably prevent new I/O requests
from being issued concurrently on a different CPU -- the function only sets a
flag?

Looking forward to your insights!

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index fb41636519ee..dcfb0a820977 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1949,6 +1949,8 @@ static int storvsc_suspend(struct hv_device *hv_dev)
struct Scsi_Host *host = stor_device->host;
struct hv_host_device *host_dev = shost_priv(host);

+ scsi_block_requests(host);
+
storvsc_wait_to_drain(stor_device);

drain_workqueue(host_dev->handle_error_wq);
@@ -1968,10 +1970,14 @@ static int storvsc_suspend(struct hv_device *hv_dev)

static int storvsc_resume(struct hv_device *hv_dev)
{
+ struct storvsc_device *stor_device = hv_get_drvdata(hv_dev);
+ struct Scsi_Host *host = stor_device->host;
int ret;

ret = storvsc_connect_to_vsp(hv_dev, storvsc_ringbuffer_size,
hv_dev_is_fc(hv_dev));
+ if (!ret)
+ scsi_unblock_requests(host);
return ret;
}


This is the log of the panic:

[ 8.565615] PM: Adding info for No Bus:vcsa63
[ 8.590118] Freezing user space processes ... (elapsed 0.020 seconds) done.
[ 8.619143] OOM killer disabled.
[ 8.645914] PM: Using 3 thread(s) for decompression
[ 8.650805] PM: Loading and decompressing image data (307144 pages)...
[ 8.693765] PM: Image loading progress: 0%
[ 9.286720] PM: Image loading progress: 10%
[ 9.541665] PM: Image loading progress: 20%
[ 9.777528] PM: Image loading progress: 30%
[ 10.062504] PM: Image loading progress: 40%
[ 10.317178] PM: Image loading progress: 50%
[ 10.588564] PM: Image loading progress: 60%
[ 10.796801] PM: Image loading progress: 70%
[ 11.029323] PM: Image loading progress: 80%
[ 11.327868] PM: Image loading progress: 90%
[ 11.650745] PM: Image loading progress: 100%
[ 11.655851] PM: Image loading done
[ 11.659596] PM: hibernation: Read 1228576 kbytes in 2.99 seconds (410.89 MB/s)
[ 11.668702] input input1: type quiesce
[ 11.668741] sr 0:0:0:1: bus quiesce
[ 11.668804] sd 0:0:0:0: bus quiesce
[ 11.672970] input input0: type quiesce
[ 11.698082] scsi target0:0:0: bus quiesce
[ 11.703296] scsi host0: bus quiesce
[ 11.707448] alarmtimer alarmtimer.0.auto: bus quiesce
[ 11.712782] rtc rtc0: class quiesce
[ 11.716560] platform Fixed MDIO bus.0: bus quiesce
[ 11.721911] serial8250 serial8250: bus quiesce
[ 11.727220] simple-framebuffer simple-framebuffer.0: bus quiesce
[ 11.734066] platform pcspkr: bus quiesce
[ 11.738726] rtc_cmos 00:02: bus quiesce
[ 11.743353] serial 00:01: bus quiesce
[ 11.747433] serial 00:00: bus quiesce
[ 11.751654] platform efivars.0: bus quiesce
[ 11.756316] platform rtc-efi.0: bus quiesce
[ 11.760883] platform HYPER_V_GEN_COUN:00: bus quiesce
[ 11.766255] platform VMBUS:00: bus quiesce
[ 11.770668] platform PNP0003:00: bus quiesce
[ 11.781730] hv_storvsc bf78936f-7d8f-45ce-ab03-6c341452e55d: noirq bus quiesce
[ 11.796479] hv_netvsc dda5a2be-b8b8-4237-b330-be8a516a72c0: noirq bus quiesce
[ 11.804042] BUG: kernel NULL pointer dereference, address: 0000000000000090
[ 11.804996] #PF: supervisor read access in kernel mode
[ 11.804996] #PF: error_code(0x0000) - not-present page
[ 11.804996] PGD 0 P4D 0
[ 11.804996] Oops: 0000 [#1] SMP PTI
[ 11.804996] CPU: 18 PID: 353 Comm: kworker/18:1 Not tainted 5.6.0+ #1
[ 11.804996] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 05/16/2019
[ 11.804996] Workqueue: events_freezable_power_ disk_events_workfn
[ 11.804996] RIP: 0010:storvsc_queuecommand+0x261/0x714 [hv_storvsc]
[ 11.804996] Code: ...
[ 11.804996] RSP: 0018:ffffa331c2347af0 EFLAGS: 00010246
[ 11.804996] RAX: 0000000000000000 RBX: ffff8e6a32cec5a0 RCX: 0000000000000000
[ 11.804996] RDX: 0000000000000012 RSI: ffff8e6a32cec3e0 RDI: ffff8e6a32cec710
[ 11.804996] RBP: ffff8e6b6d58c800 R08: 0000000000000010 R09: ffff8e6a32aa6060
[ 11.804996] R10: ffffc8a8c4c81980 R11: 0000000000000000 R12: 0000000000000012
[ 11.804996] R13: 0000000000000012 R14: ffff8e6a32cec710 R15: ffff8e6a32cec3d8
[ 11.804996] FS: 0000000000000000(0000) GS:ffff8e6a42c80000(0000) knlGS:0000000000000000
[ 11.804996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.804996] CR2: 0000000000000090 CR3: 000000013a428006 CR4: 00000000003606e0
[ 11.804996] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 11.804996] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 11.804996] Call Trace:
[ 11.804996] scsi_queue_rq+0x593/0xa10
[ 11.804996] blk_mq_dispatch_rq_list+0x8d/0x510
[ 11.804996] blk_mq_sched_dispatch_requests+0xed/0x170
[ 11.804996] __blk_mq_run_hw_queue+0x55/0x110
[ 11.804996] __blk_mq_delay_run_hw_queue+0x141/0x160
[ 11.804996] blk_mq_sched_insert_request+0xc3/0x170
[ 11.804996] blk_execute_rq+0x4b/0xa0
[ 11.804996] __scsi_execute+0xeb/0x250
[ 11.804996] sr_check_events+0x9f/0x270 [sr_mod]
[ 11.804996] cdrom_check_events+0x1a/0x30 [cdrom]
[ 11.804996] sr_block_check_events+0xcc/0x110 [sr_mod]
[ 11.804996] disk_check_events+0x68/0x160
[ 11.804996] process_one_work+0x20c/0x3d0
[ 11.804996] worker_thread+0x2d/0x3e0
[ 11.804996] kthread+0x10c/0x130
[ 11.804996] ret_from_fork+0x35/0x40

Thanks,
-- Dexuan