Bernie Innocenti wrote:I have heard same thing happened with same kind of configuration, using Supermicro H8DME-2 motherboard, Opteron 2378 CPU.The error in the subject appears in the console immediately followed bv..
a hard freeze of the machine. The error occurs reproducibly on two
identical Opteron servers, each one equipped with two identical
controller cards:
03:04.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
03:06.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
We can trigger the problem within a few seconds by starting a
reconstruction on a drive hooked to port 4 (counting from 0) of the
second controller. Oddly, every other drive works reliably and the
faulty drive works if we connect it to, for example, port 4 of the first
controller.
Tested with Debian kernels 2.6.26-19 and 2.6.30-8. Let me know if
further details are needed.
0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040....
0x30000040 here means "MRdPerr":
"bad data parity detected during PCI master read".
Which means there that a data parity error happened
during outgoing data transfer on the PCI-X bus.
This could happen due to noise on the bus,
dying capacitors, or (?) bad RAM (not sure about the last one).