Re: Kernel stack corruption with SCSI

David S. Miller (davem@jenolan.rutgers.edu)
Fri, 3 Jan 1997 19:38:12 -0500


Date: Fri, 3 Jan 1997 17:38:41 GMT
From: Alan Cox <alan@cymru.net>

Ok I did some digging and some mmap runs. Basically in some cases
an mmap of a page on a SCSI device does indeed cause the kernel to
use more than 4K of kernel stack and crash. I _suspect_ its only
just tripping when the kernel stack of the process is quite dead on
a page fault during the mmap as its not a simple run this and crash
case.

Anyway it is a definite 2.0.x bug. I also cannot duplicate it so
far with IDE

I think the problem is that the scsi generic code can get extremely
deep call chains at arbitrary points in time.

So take two things, the largest scsi call chain that can result from
and interrupt signaling the completion of a command (this is very
large), and the deepest point in the fault call chain a process can
block for a page. Add these two together and you have stack overrun
most likely. The worst case point is probably when the process wakes
up, if while it is still in schedule() (the deepest part of the chain)
a scsi interrupt comes in which provokes this deep interrupt call
chain, this would be the case we are most likely seeing.

This may not be a trivial problem to fix. The sparc uses two page
stacks, and this helps, but:

a) I think sparc is still vulnerable to the case above

b) making the intel use two page stacks would just be
nothing short of stupid

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s ////
ethernet. Beat that! ////
-----------------------------------------////__________ o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><