Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix

From: Vivek Goyal
Date: Mon Jun 26 2006 - 14:17:28 EST


On Mon, Jun 26, 2006 at 11:52:28AM -0600, Eric W. Biederman wrote:
> "Miller, Mike (OS Dev)" <Mike.Miller@xxxxxx> writes:
>
> > Thanks Eric, that helps me understand. Section 8.2.2 of the open cciss
> > spec supports a reset message. Target 0x00 is the controller. We could
> > add this to the init routine to ensure the board is made sane again but
> > this would drastically increase init time under normal circumstances.
>
> Where does the init time penalty come from? How large is the
> init penalty? I suspect it is from waiting for the scsi disks to spin up.
> But I am just guessing in the dark.
>
> > And I suspect this is a hard reset, also. Not sure if that would
> > negatively impact kdump. If there were some condition we could test
> > against and perform the reset when that condition is met it would not
> > impact 99.9% of users.
>
> I am wondering if it is possible to look at the controller and
> see if it is in a bad state, (i.e. in some state besides just coming
> out of reset) and if so issue a reset. If this really is a long operation
> that would be the ideal way to handle it.
>

That's a good question. MPT fustion driver already does something like
this. It retrieves the state of IOC and then checks whether there is
a need of reset or not.

/*
* Check to see if IOC got left/stuck in doorbell handshake
* grip of death. If so, hard reset the IOC.
*/
if (ioc_state & MPI_DOORBELL_ACTIVE) {
statefault = 1;
printk(MYIOC_s_WARN_FMT "Unexpected doorbell active!\n",
ioc->name);
}

But then question will be if all the devices out there provide the
capability to query something similar to if we have just come out of reset
state or not.

> If the amount of time is really user noticeable and testing for it
> is impossible then it is probably time to talk kernel command line
> options. >
> Although it might simply be appropriate to handle commands completing
> you didn't start. I am not at all familiar with that particular piece
> of hardware so I can't make a good guess on what needs to happen there.
>
> > Thoughts, comments, flames?
>
> Good question.
>
> It is a bit of a pain but not too hard to setup a test environment
> so you can reproduce this if you are interested. Vivek should
> be the authority there.
>

Mike, I have got one setup ready with me. I have got a Compaq Smart Array
5300 controller. I can reproduce this issue consistently. I don't know
much about this device. Is it possible for you to post a patch for
resetting the device during initialization. I can test the fix and provide
you more data.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/