Re: [PATCH] scsi: sd: fix crashes in sd_resume_runtime

From: Bart Van Assche
Date: Fri Oct 15 2021 - 13:54:43 EST


On 10/15/21 00:46, Miles Chen wrote:
Crash:
[ 4.695171][ T151] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 4.710577][ T151] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 4.856708][ T151] die+0x16c/0x59c
[ 4.857191][ T151] __do_kernel_fault+0x1e8/0x210
[ 4.857833][ T151] do_page_fault+0xa4/0x654
[ 4.858418][ T151] do_translation_fault+0x6c/0x1b0
[ 4.859083][ T151] do_mem_abort+0x68/0x10c
[ 4.859655][ T151] el1_abort+0x40/0x64
[ 4.860182][ T151] el1h_64_sync_handler+0x54/0x88
[ 4.860834][ T151] el1h_64_sync+0x7c/0x80
[ 4.861395][ T151] sd_resume_runtime+0x20/0x14c
[ 4.862025][ T151] scsi_runtime_resume+0x84/0xe4
[ 4.862667][ T151] __rpm_callback+0x1f4/0x8cc
[ 4.863275][ T151] rpm_resume+0x7e8/0xaa4
[ 4.863836][ T151] __pm_runtime_resume+0xa0/0x110
[ 4.864489][ T151] sd_probe+0x30/0x428
[ 4.865016][ T151] really_probe+0x14c/0x500
[ 4.865602][ T151] __driver_probe_device+0xb4/0x18c
[ 4.866278][ T151] driver_probe_device+0x60/0x2c4
[ 4.866931][ T151] __device_attach_driver+0x228/0x2bc
[ 4.867630][ T151] __device_attach_async_helper+0x154/0x21c
[ 4.868398][ T151] async_run_entry_fn+0x5c/0x1c4
[ 4.869038][ T151] process_one_work+0x3ac/0x590
[ 4.869670][ T151] worker_thread+0x320/0x758
[ 4.870265][ T151] kthread+0x2e8/0x35c
[ 4.870792][ T151] ret_from_fork+0x10/0x20

Cc: Stanley Chu <stanley.chu@xxxxxxxxxxxx>
Fixes: ed4246d37f3b ("scsi: sd: REQUEST SENSE for BLIST_IGN_MEDIA_CHANGE devices in runtime_resume()")
Signed-off-by: Miles Chen <miles.chen@xxxxxxxxxxxx>
---
drivers/scsi/sd.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 523bf2fdc253..fce63335084e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3683,7 +3683,12 @@ static int sd_resume(struct device *dev)
static int sd_resume_runtime(struct device *dev)
{
struct scsi_disk *sdkp = dev_get_drvdata(dev);
- struct scsi_device *sdp = sdkp->device;
+ struct scsi_device *sdp;
+
+ if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */
+ return 0;
+
+ sdp = sdkp->device;
if (sdp->ignore_media_change) {
/* clear the device's sense data */

Fixing this crash by adding a check inside sd_resume_runtime() seems wrong to me. sd_probe() namely calls dev_set_drvdata(dev, sdkp) before sd_probe() has finished so even with the above patch applied sd_resume() can be called before sd_probe() has finished.

With which kernel version has this crash been encountered? The scsi_autopm_get_device() / scsi_autopm_put_device() pair added by commit 6fe8c1dbefd6 ("scsi: balance out autopm get/put calls in scsi_sysfs_add_sdev()"; kernel v3.18) should be sufficient to prevent the reported crash.

Thanks,

Bart.