Re: SCSI Kernel Problem - BAD

Leonard N. Zubkoff (lnz@dandelion.com)
Wed, 13 Mar 1996 13:08:02 -0800


Date: Mon, 11 Mar 1996 16:56:33 -0600 (CST)
From: Simon Shapiro <Shimon@i-Connect.Net>

Hi Leonard ( and all the other Linux/SCSI gurus :-);

What happens if the HBA starts losing interrupts every now and then?

That probably depends on the architecture of the individual host adapter and
the driver, and precisely what you mean by "losing" interrupts. If you mean
that there is enough latency on an interrupt that a second one would have been
posted had the first one completed, that's not a problem at all for the
BusLogic driver. Each interrupt will cause the interrupt handler to scan
incoming mailboxes for any completed work, so if multiple incoming mailboxes
have completion information that's fine. One does have to be careful to poll
the host adapters properly so that there aren't any race conditions, of course.

As I was reading the dialogue on this sublject, it occured to me that the
problem is more severe when the I/O rate is high. not the amount of data
as much as the number of I/O operations per second. In a PC, these are
typically directly related to interrupts. I also noticed that a busy NFS
connection will aggravate it even more. and I just saw that when NFS was
busy on these partitions, I was loosing interrupts on the wd.c driver.

I suspect the mid level SCSI code may have problems with high I/O rates. In my
performance testing, where I pre-allocated and reused the SCSI Command
structures, I saw rates as high as 5830 I/O operations per second through 4
host adapters with 2 disks each; I was seeing 5633 interrupts per second which
implies there were definitely some interrupts serviced slowly enough for there
to be multiple incoming mailboxes used. This worked just fine, and I also had
no timeout problems unless I overloaded the SCSI bus (used large enough block
sizes that timeouts were inevitable). This suggests that the present timeout
problems are due to device allocation in scsi.c or the layers above.

I didn't have NFS or other heavy network activity, so perhaps that is related.

Leonard