filesystem corruption with Linux-1.3.66 and newer Adaptec 2940 code

James MacKinnon (jmack@phys.ualberta.ca)
Sun, 18 Feb 1996 22:21:50 -0700 (MST)


From:
James S. MacKinnon Office: P-139 Avahd-Bhatia Physics Lab
Department of Physics Voice : (403) 492-8226
University of Alberta email : Jim.MacKinnon@Phys.UAlberta.CA
Edmonton, Canada T6G 2N5
WWW: http://www.phys.ualberta.ca/~jmack/jmack.html
--
Hi Linus,

Here is the problem:

I seem to have run into a scsi disk I/O bug in 1.3.66 which does not occur in the 1.3.5x and early 1.3.6x series.

The machine is a P133, ASUS mb with 256k PBcache, 16MB EDO ram. The SCSI controller is an Adaptec-PCI 2940, and the disk in use is a Fujitsu M1606S-512:

While running BYTE's benchmark in 1.3.66, writing to a scsi disk during the Filesystem Throughput test, all hell breaks loose, and the root filesystem disappears. (the root disk however, is IDE, not scsi). During this test, which is heavily disk I/O intensive, the job keeps plugging along until the kernel hangs solid, even though calls to executables cannot be made due to filesystem corruption in the root:

# df . Filesystem 1024-blocks Used Available Capacity Mounted on /dev/sda1 513147 455218 31424 94% /Users

#./Run ...(ok until):

Execl Throughput Test 1 2 3 4 5 6

Filesystem Throughput Test (10 second test) 1 2 3 4 ./Run: /bin/sync: cannot execute binary file ./Run: /bin/sync: cannot execute binary file sleep: can't load dynamic linker '/lib/ld.so' 5./Run: /bin/sync: cannot execute binary file ./Run: /bin/sync: cannot execute binary file sleep: can't load dynamic linker '/lib/ld.so' 6./Run: /bin/sh: cannot execute binary file sed: can't load dynamic linker '/lib/ld.so' ./Run: /bin/rm: cannot execute binary file ... [continues for as long as the kernel lives, but I/O ends on kernel lockup]

Although not logged anywhere (since the root filesystem went to never-never-land, the virtual console had a long series of errors similar to (hand copied):

EXT2-fs error (device 03:41): ext2_find_entry: bad entry in directory #2048: rec_len %4 !=0 -offset=0 inde=136598667, rec_len=5258, name_len=34832 ...

Seems the kernel lost all sense of the root filesystem.

This is replicable. If I reboot and start the same job again, it does exactly the same thing at some point during the Filesystem Throughput test.

On re-boot, e2fsck finds errors on the / (root) IDE drive and fixes them. No errors on the scsi disks(i.e. not found to be dirtyi, although the I/O in the test is via scsi and not ide).

This behavior was not apparent in 1.3.5x kernels. Possible source of the problem could be the recent changes to the scsi aic7xxx code in the 1.3.6x kernels.

for 1.3.66 with the newer zic7xxx code, the boot log shows:

Feb 18 15:13:38 laddie kernel: aic7xxx: BurstLen = 8 DWDs, Latency Timer = 32 PCLKS Feb 18 15:13:38 laddie kernel: aic7xxx: AHA-2940 Rev B. Feb 18 15:13:38 laddie kernel: aic7xxx: devconfig = 0x500. Feb 18 15:13:38 laddie kernel: aic7xxx: Reading SEEPROM...done. Feb 18 15:13:38 laddie kernel: aic7xxx: Extended translation enabled. Feb 18 15:13:38 laddie kernel: aic7xxx: Using 16 SCB's; No SCB memory check.

The Burstlen and SCB code are new features...

For now I have replaced the aic7xxx* code from 1.3.57 into 1.3.66 and this problem goes away. The test completes normally without any trouble.

Any ideas as to what might be wrong with the newer aic7xxx code? I have the feeling that there may be a bug with the PCI bus mastering code in the Adaptec driver which is in conflict with the Triton 82371 IDE bus mastering routines, but can't track it down.

FYI, here is the proc info (from my modified 1.3.66 kerneli with the old aic7xxx code replacement):

>cat /proc/pci

PCI devices found: Bus 0, device 12, function 0: SCSI storage controller: Adaptec AIC-7871 (rev 0). Medium devsel. Fast back-to-back capable. IRQ 10. Master Capable. Latency=32. Min Gnt=8.Max Lat=8. I/O at 0xe400. Non-prefetchable 32 bit memory at 0xf8ff0000. Bus 0, device 11, function 0: VGA compatible controller: Cirrus Logic GD 5434 (rev 251). Fast devsel. Non-prefetchable 32 bit memory at 0xf9000000. Bus 0, device 7, function 1: IDE interface: Intel 82371 Triton PIIX (rev 2). Medium devsel. Fast back-to-back capable. Master Capable. Latency=32. I/O at 0xe800. Bus 0, device 7, function 0: ISA bridge: Intel 82371 Triton PIIX (rev 2). Medium devsel. Fast back-to-back capable. Master Capable. No bursts. Bus 0, device 0, function 0: Host bridge: Intel 82437 (rev 2). Medium devsel. Master Capable. Latency=32.

>cat /proc/scsi/aic7xxx/0 Adaptec AIC7xxx driver version: 2.15/2.2/2.3

Compile Options: AIC7XXX_RESET_DELAY : 15 AIC7XXX_TWIN_SUPPORT : Enabled AIC7XXX_TAGGED_QUEUEING: Disabled AIC7XXX_SHARE_IRQS : Enabled AIC7XXX_PROC_STATS : Disabled

Adapter Configuration: SCSI Adapter: AHA-2940 Host Bus: Single Base IO: 0xd800 IRQ: 10 SCB: 2 (16) Interrupts: 97589 Serial EEPROM: True Pause/Unpause: 0x0e/0x0a Extended Translation: Enabled SCSI Bus Reset: Disabled Ultra SCSI: Disabled