Re: Linux 3.0 oopses when pulling a USB CDROM

From: Hannes Reinecke
Date: Fri Oct 21 2011 - 09:26:54 EST


On 10/18/2011 11:30 PM, James Bottomley wrote:
On Wed, 2011-10-19 at 02:46 +0530, Ankit Jain wrote:
On Wed, Jul 20, 2011 at 3:28 PM, Jack Wang<jack_wang@xxxxxxxxx> wrote:

<snip>
On Sat, Jul 2, 2011 at 12:59 PM, Alan Stern<stern@xxxxxxxxxxxxxxxxxxx>
wrote:
On Sat, 2 Jul 2011, Andi Kleen wrote:

The problem is that blk_peek_request() calls scsi_prep_fn(), which
does this:

struct scsi_device *sdev = q->queuedata;
int ret = BLKPREP_KILL;

if (req->cmd_type == REQ_TYPE_BLOCK_PC)
ret = scsi_setup_blk_pc_cmnd(sdev, req);
return scsi_prep_return(q, req, ret);

It doesn't check to see if sdev is NULL, nor does
scsi_setup_blk_pc_cmnd(). That accounts for this error:

I actually added a NULL check in scsi_setup_blk_pc_cmnd early on,
but that just caused RCU CPU stalls afterwards and then eventually
a hung system.

The RCU problem is likely to be a separate issue. It might even be a
result of the use-after-free problem with the elevator.

At any rate, it's clear that the crash in the refcounting log you
posted occurred because scsi_setup_blk_pc_cmnd() called
scsi_prep_state_check(), which tried to dereference the NULL pointer.

Would you like to try this patch to see if it fixes the problem? As I
said before, I'm not certain it's the best thing to do, but it worked
on my system.

Alan Stern




Index: usb-3.0/drivers/scsi/scsi_lib.c
===================================================================
--- usb-3.0.orig/drivers/scsi/scsi_lib.c
+++ usb-3.0/drivers/scsi/scsi_lib.c
@@ -1247,6 +1247,8 @@ int scsi_prep_fn(struct request_queue *q
struct scsi_device *sdev = q->queuedata;
int ret = BLKPREP_KILL;

+ if (!sdev)
+ return ret;
if (req->cmd_type == REQ_TYPE_BLOCK_PC)
ret = scsi_setup_blk_pc_cmnd(sdev, req);
return scsi_prep_return(q, req, ret);
Index: usb-3.0/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-3.0.orig/drivers/scsi/scsi_sysfs.c
+++ usb-3.0/drivers/scsi/scsi_sysfs.c
@@ -322,6 +322,8 @@ static void scsi_device_dev_release_user
kfree(evt);
}

+ /* Freeing the queue signals to block that we're done */
+ scsi_free_queue(sdev->request_queue);
blk_put_queue(sdev->request_queue);
/* NULL queue means the device can't be used */
sdev->request_queue = NULL;
@@ -936,8 +938,6 @@ void __scsi_remove_device(struct scsi_de
/* cause the request function to reject all I/O requests */
sdev->request_queue->queuedata = NULL;

- /* Freeing the queue signals to block that we're done */
- scsi_free_queue(sdev->request_queue);
put_device(dev);
}

This patch seems to resolve the block/scsi null-ptr de-references in
our libsas/isci environment, we have yet to try James' alternative
[1]. Do we potentially need both?

Commit 86cbfb56 moved scsi_free_queue to __scsi_remove_device() but it
seems only the "sdev->request_queue->queuedata = NULL" needed to be
moved?

The conversation appeared to be awaiting test results...

[1]: http://marc.info/?l=linux-scsi&m=131007155700831&w=2

--
Dan
[Jack Wang]
This patch fix kernel panic issue when hot-plut disk during I/O, I test it
using pm8001 with 3.0.0-rc6 with above patch.

I don't see this patch in scsi-misc-2.6 or linus' tree. Is there a
different patch that fixes the
issue?

It should be fixed by

commit 777eb1bf15b8532c396821774bf6451e563438f5
Author: Hannes Reinecke<hare@xxxxxxx>
Date: Wed Sep 28 08:07:01 2011 -0600

block: Free queue resources at blk_release_queue()

As much as I've hate to admit it, but it looks as if this is only a fix for the second part of the original patch.
I've got reports that we still see crashes, which are fixed by the patch to scsi_lib.c.

So please include this part.
Do you need a resend?

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@xxxxxxx +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 NÃrnberg
GF: J. Hawn, J. Guild, F. ImendÃrffer, HRB 16746 (AG NÃrnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/