Re: [PATCH] cciss: force ignore of responses to unsent scsi commands after kexec reboot

From: Vivek Goyal
Date: Mon Jun 18 2007 - 00:30:57 EST


On Thu, Jun 14, 2007 at 03:25:23PM -0400, Neil Horman wrote:
> On Thu, Jun 14, 2007 at 06:16:03PM -0000, Miller, Mike (OS Dev) wrote:
> >
> >
> > > -----Original Message-----
> > > From: Neil Horman [mailto:nhorman@xxxxxxxxxxxxx]
> > > Sent: Thursday, June 14, 2007 10:31 AM
> > > To: linux-kernel@xxxxxxxxxxxxxxx
> > > Cc: Miller, Mike (OS Dev); ISS StorageDev;
> > > akpm@xxxxxxxxxxxxxxxxxxxx; nhorman@xxxxxxxxxxxxx
> > > Subject: [PATCH] cciss: force ignore of responses to unsent
> > > scsi commands after kexec reboot
> > >
> > > Hey -
> > > cciss hardware currently can continue to send responses
> > > to scsi commands after the host system has undergone a kexec
> > > reboot. The way the drier is currently written, reception of
> > > these commands results in a BUG halt, since it can't match
> > > the response to any issued command since the boot. This
> > > patch corrects that by using the kexec reset_devices command
> > > line paramter to force ignore any commands that it cant correlate.
> > >
> > > Regards
> > > Neil
> > >
> > > Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx>
> > >
> > >
> > > cciss.c | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > >
> > > diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> > > index 5acc6c4..ec1c1d2 100644
> > > --- a/drivers/block/cciss.c
> > > +++ b/drivers/block/cciss.c
> > > @@ -2131,6 +2131,14 @@ static int add_sendcmd_reject(__u8
> > > cmd, int ctlr, unsigned long complete)
> > > ctlr, complete);
> > > /* not much we can do. */
> > > #ifdef CONFIG_CISS_SCSI_TAPE
> > > + /* We might get notification of completion of commands
> > > + * which we never issued in this kernel if this boot is
> > > + * taking place after previous kernel's crash. Simply
> > > + * ignore the commands in this case.
> > > + */
> > > + if (reset_devices)
> > > + return 0;
> > > +
> > > return 1;
> > > }

I think this is not the right usage of reset_devices parameter. This
parameter instructs the driver to reset the device before going ahead
with rest of the initialization before as underlying device might not
be in a sane state. kexec/kdump is one of the usages and this can also
be useful in the case of BIOS not doing its job.

When I had proposed crash_boot parameter for kexec/kdump purposes, that time
andrew had suggested that he is afraid that driver authors will use this
parameter to solve all kind of problems.

I think we should stick to the theme of the parameter and implement the
reset routine for cciss driver instead of simply returning back. Consider
the case of hypothetical scenario where somebody booted the kernel with
reset_device parameter (because of unreliable bios) and if there is a problem
on kernel side that after it issues the command it lost track of that
(because of kernel bug) then driver will never catch that bug as upon receiving
the response it will simply ignore that.

Mike, you know most about this device. Can you please help out with
implementing a reset routing for it?

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/