Silicon Image SATA woes

From: Lasse KÃrkkÃinen / Tronic
Date: Fri Jul 16 2004 - 17:31:12 EST

Running 2.6.7-cko1 on Athlon64, VIA K8T800, Epox 8hda3+. All the SATA disks are Samsung SP1614C. There's a nine disk (5xSP1614C, 4xSP1614N) RAID-5, and ReiserFS on top of that.

Q-TEC SiI3112ATC144 two port PCI SATA card - shown as devices hde and hdg - works fine and never gives any errors in kernel log. Well, almost. The drawback is that it corrupts data randomly, about once in every ten gigabytes of data read. I don't know if this affects writes. Testing was done by repeatedly dd'ing a swap partition (that was not in use) to a file and then diffing it with the known contents. Initially it was found by unrar getting unexpected and random CRC errors, then PAR2 finding errors and md5sums not matching.

The on-board SiI3114 is having different kind on problem. After few minutes of heavy use I get the following in kernel log:
ata2: DMA timeout, stat 0x61

After this happens, trying to access the filesystem, /dev/md0, /proc/mdstat or /dev/sdX (where X is either a, b or c; this seems to vary from crash to crash) halts the accessing process. The log entry always seems to be about ata2. I have tried reordering the disks without help. However, this problem didn't surface with only one disk on the controller.

I tried 2.6.8-rc1, but that was so unstable (*) that I could hardly boot the system and couldn't keep running on it. However, I confirmed that the 3112 data corruption also occurs there.

*) for instance, running ifconfig (by the startup scripts) OOPSed the kernel and halted the system until ifconfig was killed; unfortunately, I didn't get the log of this one.

I need quick answer: are these hardware faults and should I take the hardware back for warranty? I could also get some other hardware instead of that 3112... Would that be wise?

- Tronic -

Attachment: signature.asc
Description: OpenPGP digital signature