Re: Kernel Stack Overflows

Leonard N. Zubkoff (lnz@dandelion.com)
Tue, 20 Feb 1996 01:18:36 -0800


From: "Eric Youngdale" <eric@aib.com>
Date: Mon, 19 Feb 1996 11:01:04 -0500

Yes. As little as possible :-).

I'd never have guessed that...

In the early days of SCSI, we had problems where if you used certain
removable disks with the seagate driver, that the system would spontaneously
reboot. This was finally tracked down to a problem whereby the device
wouldn't disconnect, and as a result the seagate driver was just recursively
eating up more and more stack as long as there were more requests to
be processed. If the request queue ever became empty, all of the stack
would be released, of course. As I recall, we were able to recurse
something
like 15 to 20 times before the stack overflowed (or was it before the
system crashed?).

Indeed. I discovered another form of stack overflow death not too long ago.
It's very dangerous for the Queue Command function to immediately complete a
command with an error that might be retried. The completion processing may
retry immediately and so on. Not pretty. In the latest verison of the
BusLogic driver I'm working on I've removed all such cases, however unlikely
they may be. Better to toss the SCSI Command on the floor and let a timeout
(and probably board reset) happen than risk recursive stack death.

One of the unstated things in the design of things in the scsi
code is that there are no arrays or large structures allocated off the
stack. The DMA pool generally serves as a reservoir of memory which can
be allocated atomicly (whether it needs to be DMA-able or not). Also, in the
1542 driver, I don't allow further interrupts while I am in the interrupt
handler. I just sit in there and keep looping as long as there is something
that needs to be done, and then return when there is nothing to do.

For some reason, I believe the BusLogic drivers have always enabled interrupts
after scanning the incoming mailboxes. With multiple cards on different IRQs,
this could lead to several levels deep of stacked interrupts. I think the next
version of the driver will remove the sti(), since it appears it may be safer,
and completion processing should be quite fast.

Leonard