Re: ECC and DMA to/from disk controllers

From: Bruce Allen
Date: Wed Sep 12 2007 - 23:38:28 EST


Alan, Robert, Dick,

Thank you all for the informed and helpful response!

Alan, I'll pass your comments on to Peter Kelemen. Not sure if he follows LKML. I think he'll be interested in your characterization of the error types. I'll point him to the thread. (I think Peter and his collaborators are fairly aware of the undetected error rates in standard ethernet TCP/IP traffic which as I recall is about one undetected single-bit error per 4TB transfered. I am pretty sure they have ruled this out since they have checksums computed after any network transfers.)

Robert, Dick, if I have understood correctly, in response to my specific question, RAID controllers on PCI cards will DMA data into memory over a PCI bus using one parity bit per 32 data bits for protection. This does provide some protection against errors in the data transfer, but much less protection than typical RAM ECC which has one ECC byte for each eight data bytes. As I recall, many older motherboards disabled parity on the PCI bus, so even this protection may be inactive in many cases. From a few minutes of on-line research, I have the impression that PCI-e has better ECC protection against address/data errors than PCI but I am not certain.

Thanks again!

Cheers,
Bruce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/