Re: smp dead lock of io_request_lock/queue_lock patch

From: Doug Ledford
Date: Mon Jan 12 2004 - 08:32:24 EST


On Mon, 2004-01-12 at 04:22, Jens Axboe wrote:
> On Mon, Jan 12 2004, Arjan van de Ven wrote:
> > On Mon, Jan 12, 2004 at 10:19:46AM +0100, Jens Axboe wrote:
> > >
> > > ... and still exists in your 2.4.21 based kernel.
> >
> > The RHL 2.4.21 kernels don't have the locking patch at all...
>
> But RHEL3 does:
>
> http://kernelnewbies.org/kernels/rhel3/SOURCES/linux-2.4.21-iorl.patch
>
> and the bug is there.

But in RHEL3 the bug is fixed already (not in a released kernel, but the
fix went into our internal kernel some time back and will be in our next
update kernel). From my internal bk tree for this stuff:

[dledford@compaq RHEL3-scsi]$ bk changes -r1.23
ChangeSet@xxxx, 2003-11-10 17:19:54-05:00, dledford@xxxxxxxxxxxxxxxxxxxxxx
drivers/scsi/scsi_error.c
Don't panic if the eh thread is dead, instead do the same thing that
scsi_softirq_handler does and just complete the command as a failed
command.
Change when we wake the eh thread in scsi_times_out to accomodate
the changes to the mlqueue operations.
Clear blocked status on the host and all devices in scsi_restart_operations
-> Don't grab the host_lock in scsi_restart_operations, we aren't doing
anything that needs it. Just goose the queues unconditionally,
scsi_request_fn() will know to not send commands if they shouldn't
go for some reason.
Make sure we account SCSI_STATE_MLQUEUE commands as not being failed
commands in scsi_unjam_host.

But, Jens is right, it's a real bug. I just fixed it in a different
way. And my fix is dependent on other changes in our scsi stack as
well.

--
Doug Ledford <dledford@xxxxxxxxxx> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/