FlashPoint card hanging system (BT-950)

William Lash (welash@xnet.com)
Wed, 29 Sep 1999 11:33:56 -0500 (CDT)


I've been having a problem with Linux freezing solid sometimes when
talking to my CDRW drive. The CDRW drive is a Memorex CRW620, and is
connected to a Mylex/Buslogic BT-950 (Flashpoint) SCSI card. This is
the only device on the SCSI bus.

This setup was working fine on 2.0.36 for a long time, and then I
upgraded the processor from a K6II-300 to a K6II-450. This only
required a change of the processor voltage and clock multiplier, the
bus has always been running at 100 MHz. I have seen no other
instability due to the higer processor speed. No Sig 11s on kernel
compiles, no other lockups. As long as I didn't mess with the CDRW I
was ok.

My initial impulse was to upgrade my kernel, first to 2.0.38, then to
2.2.12. Both of these kernels exhibit the same behavior. I can
reliably lock the machine by mounting and unmounting a disc in the CDRW
drive a few times.

I reduced the multiplier of the new processor to 3 so that I was
running at 300MHz, and the CDRW drive worked fine. Tried it at 400 MHz
and I still have lock ups.

I searched through the kernel mailing list, and saw mentions of lockups
with the VIA chipset (I have a FIC VA503+ motherboard) so I followed
the suggestions of people on that discussion and changed some of the
BIOS settings to no avail. I saw some talk about other SCSI lockups,
and tried changing a lot of different things in the SCSI BIOS, to no
avail.

I finally tried the IKD patch to try and see where the processor was
getting stuck. Using the feature that writes the EIP to the video
memory, I saw that it kept reporting an address that was in the
routine phaseBusFree() of the FlashPoint.c file. This number was
constant, but the second number (on the same line) was still counting
(very quickly). I wasn't exactly sure what this was supposed to mean,
there were no loops inside that routine to get stuck in, so I put it
aside for a while.

Finally I looked at where phaseBusFree() is called from, and noticed 2
things:

1. It is called from the ISR for the card
2. That there are a couple of while loops in the ISR that could be
getting stuck.

So I decided to try to add some code to see what was going on. After a
couple of lockups with no information, I finally got something that
tells me what is going on, and "recovers" from it. I put recovers in
quotes, because I doubt that this is a good solution, basically it
detects if the main while() loop, (which is servicing the interrupt
until all bits are clear in the interrupt status register), runs more
than a reasonable number of times. When it detects this, it prints out
some information, tries to clear out all the interrupts, and exits with
a code that causes the higher level routine to reset the Host Adapter.

If I return 0 instead of the code to reset the adapter, the system
still locks up.

Anyway, I find that the interrupt is entered due to a BUSFREE
indication in the interrupt status register. The ISR goes down, and
calls phaseBusFree(), presumably to handle that condition, but the bit
never seems to get cleared in the interrupt status register.

Of course, the interesting question is, "Why does this only show up
with a faster processor?". There seem to be 2 possibilities, either
the faster processor causes the original BUSFREE interrupt to occur in
such a way that the normal BUSFREE handling can't clear it, or that the
faster processor makes it so that the code that is supposed to clear
the BUSFREE condition doesn't work correctly.

Anyway, attached below are the patch that I used to gather data, and
the logged results from mounting the CD several times. The patch
contains code to detect other conditions in the ISR that I thought
might cause problems as well. Any ideas about what to try would be
appreciated. I will forward a copy of this to mylex tech support
as well. It looks like the FlashPoint.c file is basically some code
from their development kit.

Bill Lash
welash@xnet.com

FlashPoint.c patch --------

*** orig.FlashPoint.c Wed Sep 29 09:35:37 1999
--- FlashPoint.c Wed Sep 29 10:07:43 1999
***************
*** 4684,4695 ****
--- 4684,4700 ----
UCHAR thisCard,result,bm_status, bm_int_st;
USHORT hp_int;
UCHAR i, target;
+ int inner_ctr, outer_ctr, first_int_stat, prev_int_stat;
#if defined(DOS)
USHORT ioport;
#else
ULONG ioport;
#endif

+ inner_ctr = 0;
+ outer_ctr = 0;
+ first_int_stat = 0;
+ prev_int_stat = 0;
mOS_Lock((PSCCBcard)pCurrCard);

thisCard = ((PSCCBcard)pCurrCard)->cardIndex;
***************
*** 4714,4719 ****
--- 4719,4730 ----
bm_status)
{

+ if(outer_ctr == 0)
+ {
+ first_int_stat = hp_int;
+ }
+ outer_ctr++;
+
currSCCB = ((PSCCBcard)pCurrCard)->currentSCCB;

#if defined(BUGBUG)
***************
*** 4748,4754 ****
may not show up if another device reselects us in 1.5us or
less. SRR Wednesday, 3/8/1995.
*/
! while (!(RDW_HARPOON((ioport+hp_intstat)) & (BUS_FREE | RSEL))) ;
}

if (((PSCCBcard)pCurrCard)->globalFlags & F_HOST_XFER_ACT)
--- 4759,4781 ----
may not show up if another device reselects us in 1.5us or
less. SRR Wednesday, 3/8/1995.
*/
! inner_ctr = 0;
! while (!(RDW_HARPOON((ioport+hp_intstat)) & (BUS_FREE | RSEL)))
! {
! if(inner_ctr++ > 10000)
! {
! printk("BillDebug: Inner Loop executed too much. outer_ctr=%d, first_int_stat=0x%x, prev_int_stat =0x%x, hp_int=0x%x\n",outer_ctr,first_int_stat,prev_int_stat,hp_int);
!
! WRW_HARPOON((ioport+hp_intstat), CLR_ALL_INT);
!
! mOS_Lock((PSCCBcard)pCurrCard);
! MENABLE_INT(ioport);
! mOS_UnLock((PSCCBcard)pCurrCard);
! return(0xfe);
!
! }
! }
! ;
}

if (((PSCCBcard)pCurrCard)->globalFlags & F_HOST_XFER_ACT)
***************
*** 4908,4913 ****
--- 4935,4945 ----
WRW_HARPOON((ioport+hp_intstat), ITICKLE);
((PSCCBcard)pCurrCard)->globalFlags |= F_NEW_SCCB_CMD;
}
+ else
+ {
+ printk("BillDebug: Reached Else. outer_ctr=%d, first_int_stat=0x%x, prev_int_stat =0x%x, hp_int=0x%x\n",outer_ctr,first_int_stat,prev_int_stat,hp_int);
+
+ }



***************
*** 4930,4935 ****
--- 4962,4980 ----
break;

}
+ if(outer_ctr > 50)
+ {
+ printk("BillDebug: Outer Loop executed too much. outer_ctr=%d, first_int_stat=0x%x, prev_int_stat =0x%x, hp_int=0x%x\n",outer_ctr,first_int_stat,prev_int_stat,hp_int);
+
+ WRW_HARPOON((ioport+hp_intstat), CLR_ALL_INT);
+
+ mOS_Lock((PSCCBcard)pCurrCard);
+ MENABLE_INT(ioport);
+ mOS_UnLock((PSCCBcard)pCurrCard);
+ return(0xfe);
+ }
+
+ prev_int_stat = hp_int;

} /*end while */

------End of FlashPoint.c Patch, begin excerpt from logfile -----

Sep 28 22:16:21 welash kernel: scsi: ***** BusLogic SCSI Driver Version 2.1.15 of 17 August 1998 *****
Sep 28 22:16:21 welash kernel: scsi: Copyright 1995-1998 by Leonard N. Zubkoff <lnz@dandelion.com>
Sep 28 22:16:21 welash kernel: scsi0: Configuring BusLogic Model BT-950 PCI Wide Ultra SCSI Host Adapter
Sep 28 22:16:21 welash kernel: scsi0: Firmware Version: 5.02, I/O Address: 0x7000, IRQ Channel: 11/Level
Sep 28 22:16:21 welash kernel: scsi0: PCI Bus: 0, Device: 10, Address: 0xE7001000, Host Adapter SCSI ID: 7

Sep 28 22:16:21 welash kernel: scsi0: Parity Checking: Enabled, Extended Translation: Enabled
Sep 28 22:16:21 welash kernel: scsi0: Synchronous Negotiation: Fast, Wide Negotiation: Enabled
Sep 28 22:16:21 welash kernel: scsi0: Disconnect/Reconnect: NYYYYYY#YYYYYYYY, Tagged Queuing: NYYYYYY#YYYY
YYYY
Sep 28 22:16:21 welash kernel: scsi0: Driver Queue Depth: 255, Scatter/Gather Limit: 128 segments
Sep 28 22:16:21 welash kernel: scsi0: Tagged Queue Depth: Automatic, Untagged Queue Depth: 3
Sep 28 22:16:21 welash kernel: scsi0: Error Recovery Strategy: Default, SCSI Bus Reset: Enabled
Sep 28 22:16:21 welash kernel: scsi0: SCSI Bus Termination: Both Enabled, SCAM: Disabled
Sep 28 22:16:21 welash kernel: scsi0: *** BusLogic BT-950 Initialized Successfully ***
Sep 28 22:16:21 welash kernel: scsi0 : BusLogic BT-950
Sep 28 22:16:21 welash kernel: scsi : 1 host.
Sep 28 22:16:21 welash kernel: Vendor: Model: CRW620 Rev: 1.20
Sep 28 22:16:21 welash kernel: Type: CD-ROM ANSI SCSI revision: 02
Sep 28 22:16:21 welash kernel: Detected scsi CD-ROM sr0 at scsi0, channel 0, id 0, lun 0
Sep 28 22:16:30 welash kernel: scsi0: Target 0: Queue Depth 3, Synchronous at 10.0 MB/sec, offset 15
Sep 28 22:16:30 welash kernel: sr0: scsi3-mmc drive: 6x/6x writer cd/rw xa/form2 cdda tray
Sep 28 22:16:56 welash kernel: BillDebug: Outer Loop executed too much. outer_ctr=51, first_int_stat=0x8000,
prev_int_stat =0x8000, hp_int=0x8000
Sep 28 22:16:56 welash kernel: scsi0: Internal FlashPoint Error detected - Resetting Host Adapter
Sep 28 22:16:56 welash kernel: scsi0: Resetting BusLogic BT-950 due to Host Adapter Internal Error
Sep 28 22:16:56 welash kernel: scsi0: *** BusLogic BT-950 Initialized Successfully ***
Sep 28 22:17:06 welash kernel: BillDebug: Outer Loop executed too much. outer_ctr=51, first_int_stat=0x8000,
prev_int_stat =0x8000, hp_int=0x8000
Sep 28 22:17:06 welash kernel: scsi0: Internal FlashPoint Error detected - Resetting Host Adapter
Sep 28 22:17:06 welash kernel: scsi0: Resetting BusLogic BT-950 due to Host Adapter Internal Error
Sep 28 22:17:06 welash kernel: scsi0: *** BusLogic BT-950 Initialized Successfully ***

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/