Funky PCI bug.

Dan Merillat (Dan@merillat.org)
Tue, 19 Nov 1996 17:44:32 -0500 (EST)


Hmm... got a rather nasty situation last night.
It's definatly bad hardware that caused it, but linux definatly died badly.
Ok, here's the scenario.

I put a new adaptec board (AHA 3940W) in an older SiS system
(Host bridge: Silicon Integrated Systems 85C496 (rev 1).)

Now, the SiS is crap, and didn't recognise the Adaptec. No problem, I had
an older 2940 that it liked. So I put the 2940 in next to the 3940, and
put all the SCSI drives on that. Because I was testing, I left the 3940 in.
The PCI bios reported only the 2940 (as was expected)

When I booted, at the PCI probe, I got:
Nov 19 04:50:42 news kernel: e/linux/pci.h
Nov 19 04:50:42 news kernel: Warning : Unknown PCI device (cd:0). Please read i
nclude/linux/pci.h
Nov 19 04:50:42 news kernel: Warning : Unknown PCI device (ce:0). Please read i
nclude/linux/pci.h
Nov 19 04:50:42 news kernel: Warning : Unknown PCI device (cf:0). Please read i
nclude/linux/pci.h
.........
Nov 19 04:50:42 news kernel: Warning : Unknown PCI device (fe:0). Please read i
nclude/linux/pci.h

The system was otherwise up and running fine.. the 2940 was running the SCSI chain, and all services were functioning.

So I decided to see what /proc/pci had to say.

All logs managed to get was:
Nov 19 05:13:55 news kernel: scsi : aborting command due to timeout : pid 105727
, scsi0, channel 0, id 1, lun 0 Read (6) 00 6f e7 02 00
Nov 19 05:13:55 news kernel: aic7xxx: (abort) Aborting scb 7, TCL 1/0/0
Nov 19 05:13:55 news kernel: scsi : aborting command due to timeout : pid 105728
, scsi0, channel 0, id 1, lun 0 Read (6) 00 f7 bf 02 00
<lots of similar errors.)
Nov 19 05:14:10 news kernel: SCSI bus is being reset for host 0 channel 0.
Nov 19 05:14:10 news kernel: aic7xxx: (reset) target/channel 1/0
Nov 19 05:14:10 news kernel: aic7xxx: (abort_scb) scb 7 is disconnected; bus dev
ice reset message queued.
Nov 19 05:14:10 news kernel: scsi : aborting command due to timeout : pid 105728
, scsi0, channel 0, id 1, lun 0 Read (6) 00 f7 bf 02 00
Nov 19 05:14:10 news kernel: aic7xxx: (abort) Aborting scb 0, TCL 1/0/0
Nov 19 05:14:10 news kernel: SCSI host 0 abort (pid 105728) timed out - resettin
g
Nov 19 05:14:10 news kernel: SCSI bus is being reset for host 0 channel 0.
Nov 19 05:14:10 news kernel: aic7xxx: (reset) target/channel 1/0
Nov 19 05:14:10 news kernel: aic7xxx: (abort_scb) asserted ATN - bus device rese

and this is where it got clipped.

I pulled the 3940 out and all is now happy.

The problem is, all WAS happy except the kernel thought there were 40 PCI devices?

I never got any output from cat /proc/pci so I have no idea what the kernel
thought it had found.

Anyway, the board that fails like that is in use, but I should be able
to do more tests on it soonish. I don't think we need to be able
to initalize unknown PCI cards (Actually, I think the 3940 is incompatable
with early rev PCI chipsets) but I do think we can be more graceful then
locking the PCI bus.

Anything people want me to try?

--Dan