The only non-failing drive was sdf as it was running in standby mode in this md raid 5 ensemble:
20080323-011337-sdc.log:195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 162956700
20080323-011338-sde.log:195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 148429049
Hmmm... If the drive is failing FLUSHs, I would expect to see elevated
reallocation counters and maybe some pending counts. Aieee.. weird.
But there are no reallocations nor any pending sectors on any of them.
^^^^It should have appeared as read errors. Maybe the drive successfullyHmm, I didn't noticed any data distortions, and if there where, theyFLUSH_EXT timing out usually indicates that the drive is havingIt's been 4 samsung drives at all hanging on a sata sil 3124:
problem writing out what it has in its cache to the media. There was
one case where FLUSH_EXT timeout was caused by the driver failing to
switch controller back from NCQ mode before issuing FLUSH_EXT but that
was on sata_nv. There hasn't been any similar problem on sata_sil24.
live on as copies in their new home..
write (I guess)
wrote those sectors after 30+ secs timeout.
That would point to some driver issue, wouldn't it? Roger Heflin also
experienced similar behavior with that controller, which wasn't reproducible with another.
I can offer to you rebuilding that md in a test environment, and giving you access to it, if you're interested.
Here are the errors I get, though look at it closer, I am don't appear to be getting the reset, just this error from time to time:
sd 9:0:0:0: [sde] 976773168 512-byte hardware sectors (500108 MB)
sd 9:0:0:0: [sde] Write Protect is off
sd 9:0:0:0: [sde] Mode Sense: 00 3a 00 00
sd 9:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0
ata8.00: BMDMA2 stat 0x687d8009
ata8.00: cmd 25/00:80:a7:00:1d/00:01:1d:00:00/e0 tag 0 cdb 0x0 data 196608 in
res 51/04:8f:98:01:1d/00:00:1d:00:00/f0 Emask 0x1 (device error)
ata8.00: configured for UDMA/100
I have 4 identical disks, with all 4 connected to the SIL controller all give some errors, moving 2 of the disks to a promise controller makes the errors go away on the 2 connected to the promise controller. All drives are part of a software raid5 array.