Re: Strange block/scsi/workqueue issue

From: James Bottomley
Date: Tue Apr 12 2011 - 13:41:41 EST

On Tue, 2011-04-12 at 17:51 +0100, Steven Whitehouse wrote:
> Still not quite there, but looking more hopeful now,

Not sure I share your optimism; but this one

> scsi 0:2:1:0: Direct-Access DELL PERC 6/i 1.22 PQ: 0 ANSI: 5
> scsi: killing requests for dead queue
> ------------[ cut here ]------------
> WARNING: at lib/kref.c:34 kref_get+0x2d/0x30()
> Hardware name: PowerEdge R710
> Modules linked in:
> Pid: 386, comm: kworker/6:1 Not tainted 2.6.39-rc2+ #193
> Call Trace:
> [<ffffffff8108fa9a>] warn_slowpath_common+0x7a/0xb0
> [<ffffffff8108fae5>] warn_slowpath_null+0x15/0x20
> [<ffffffff813c984d>] kref_get+0x2d/0x30
> [<ffffffff813c824a>] kobject_get+0x1a/0x30
> [<ffffffff81460874>] get_device+0x14/0x20
> [<ffffffff81478bd7>] scsi_request_fn+0x37/0x4a0

Is definitely a race between the last put of the SCSI device and the
block delayed work. The signal that mediates that race is supposed to
be the q->queuedata being null, but that doesn't get set until some time
into the release function (by which time the ref is already zero).

Closing the window completely involves setting this to NULL before we do
the final put when we know everything else is gone. So, here's the next



Index: linux-2.6/drivers/scsi/scsi_sysfs.c
--- linux-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ linux-2.6/drivers/scsi/scsi_sysfs.c
@@ -323,7 +323,6 @@ static void scsi_device_dev_release_user

if (sdev->request_queue) {
- sdev->request_queue->queuedata = NULL;
/* user context needed to free queue */
/* temporary expedient, try to catch use of queue lock
@@ -937,6 +936,7 @@ void __scsi_remove_device(struct scsi_de
if (sdev->host->hostt->slave_destroy)
+ sdev->request_queue->queuedata = NULL;

