Re: 2.4.19 SCSI core bug?

From: Russell King (rmk@arm.linux.org.uk)
Date: Thu Sep 12 2002 - 04:01:40 EST


On Wed, Sep 11, 2002 at 10:19:00PM +0100, Russell King wrote:
> Ok, so we were asking for 0xfe 512-byte sectors, which is 130048.
> So why did SCSI tell me that it wanted 38400 bytes in
> SCpnt->request_bufflen?

Ok, problem found.

There's a nice loop in scsi_send_eh_cmnd() which just loops endlessly
trying to retry a SCpnt command on medium error without restoring it
to its pristine state before giving it back to the host, or limiting
the number of retries.

The end result is that the SCSI command says N sectors, the request
list may say N * sector size bytes, but SCpnt->request_bufflen may
be complete rubbish, since the HBA driver is allowed to modify this
field during the processing of a command.

We do have a specific function to restore the SCSI command data to
a pristine state (scsi_eh_retry_command).

Ok, so there's two bugs here:

1. It's possible for SCSI to completely bring a box to a complete
   standstill when it encounters a SCSI error.

2. Not retrying the command in its correct state.

I'll look at cooking up a patch for both of these in the next few days
or so.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Sep 15 2002 - 22:00:28 EST