Re: Regression 4.17-rc1: SSD doesn properly resume causing system hang (NULL pointer dereference)

From: Bart Van Assche
Date: Tue Apr 24 2018 - 14:27:12 EST


On Tue, 2018-04-24 at 19:37 +0200, Paul Menzel wrote:
> On 04/24/18 19:31, Bart Van Assche wrote:
> Here it is, pasted as citation, as otherwise Thunderbird would wrap the
> line.
>
> > (gdb) disas blk_set_runtime_active
> > Dump of assembler code for function blk_set_runtime_active:
> > 0xc1518610 <+0>: call 0xc106ac9c <__fentry__>
> > 0xc1518615 <+5>: push %ebp
> > 0xc1518616 <+6>: mov %esp,%ebp
> > 0xc1518618 <+8>: sub $0x14,%esp
> > 0xc151861b <+11>: mov %ebx,-0xc(%ebp)
> > 0xc151861e <+14>: mov %eax,%ebx
> > 0xc1518620 <+16>: mov %gs:0x14,%eax
> > 0xc1518626 <+22>: mov %eax,-0x10(%ebp)
> > 0xc1518629 <+25>: xor %eax,%eax
> > 0xc151862b <+27>: test %ebx,%ebx
> > 0xc151862d <+29>: mov %esi,-0x8(%ebp)
> > 0xc1518630 <+32>: mov %edi,-0x4(%ebp)
> > 0xc1518633 <+35>: je 0xc15186b3 <blk_set_runtime_active+163>
> > 0xc1518635 <+37>: mov 0xfc(%ebx),%eax
> > 0xc151863b <+43>: call 0xc1a4b920 <_raw_spin_lock_irq>
> > 0xc1518640 <+48>: mov 0x150(%ebx),%esi
> > 0xc1518646 <+54>: xor %eax,%eax
> > 0xc1518648 <+56>: mov 0xc1ca7d20,%edi
> > 0xc151864e <+62>: mov %eax,0x154(%ebx)
> > 0xc1518654 <+68>: cmp $0xffffff0c,%esi
> > 0xc151865a <+74>: mov %edi,-0x14(%ebp)
> > 0xc151865d <+77>: je 0xc15186a5 <blk_set_runtime_active+149>
> > 0xc151865f <+79>: mov %edi,0xf4(%esi)

The e-mail at the start of this e-mail thread shows that %esi == NULL at
the time of the crash and also that the crash occurred at offset 79 (0x4f)
in this function. I think that means that the crash occurred in the following
code: pm_request_autosuspend(q->dev) and also that this means that q->dev ==
NULL. Can you test the (untested) patch below?

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 57cae47ab1c2..b029a94a1e66 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3272,7 +3272,6 @@ static void sd_probe_async(struct work_struct *work)
gd->events |= DISK_EVENT_MEDIA_CHANGE;
}

- blk_pm_runtime_init(sdp->request_queue, dev);
device_add_disk(dev, gd);
if (sdkp->capacity)
sd_dif_config_host(sdkp);
@@ -3390,6 +3389,8 @@ static int sd_probe(struct device *dev)
get_device(dev);
dev_set_drvdata(dev, sdkp);

+ blk_pm_runtime_init(sdp->request_queue, dev);
+
get_device(&sdkp->dev); /* prevent release before sd_probe_async() */
WARN_ON_ONCE(!queue_work(system_unbound_wq, &sdkp->probe_work));


Thanks,

Bart.