On Mon, 2004-01-12 at 04:22, Jens Axboe wrote:
On Mon, Jan 12 2004, Arjan van de Ven wrote:
On Mon, Jan 12, 2004 at 10:19:46AM +0100, Jens Axboe wrote:
... and still exists in your 2.4.21 based kernel.
The RHL 2.4.21 kernels don't have the locking patch at all...
But RHEL3 does:
http://kernelnewbies.org/kernels/rhel3/SOURCES/linux-2.4.21-iorl.patch
and the bug is there.
But in RHEL3 the bug is fixed already (not in a released kernel, but the
fix went into our internal kernel some time back and will be in our next
update kernel). From my internal bk tree for this stuff:
[dledford@compaq RHEL3-scsi]$ bk changes -r1.23
ChangeSet@xxxx, 2003-11-10 17:19:54-05:00, dledford@xxxxxxxxxxxxxxxxxxxxxx
drivers/scsi/scsi_error.c
Don't panic if the eh thread is dead, instead do the same thing that
scsi_softirq_handler does and just complete the command as a failed
command.
Change when we wake the eh thread in scsi_times_out to accomodate
the changes to the mlqueue operations.
Clear blocked status on the host and all devices in scsi_restart_operations
-> Don't grab the host_lock in scsi_restart_operations, we aren't doing
anything that needs it. Just goose the queues unconditionally,
scsi_request_fn() will know to not send commands if they shouldn't
go for some reason.
Make sure we account SCSI_STATE_MLQUEUE commands as not being failed
commands in scsi_unjam_host.
But, Jens is right, it's a real bug. I just fixed it in a different
way. And my fix is dependent on other changes in our scsi stack as
well.