(Please cc me on replies)
I have three samsung hdds (/sys/block/sda/device/model says SAMSUNG SP2504C) in a raid configuration. My system frequently (2-3x/day) experiences temporary lockups, which produce messages as below in my dmesg/syslog. The system recovers, but the hang is annoying to say the least.
All three drives are connected to sata_nv ports. Oddly, it almost always happens on ata6 or ata7 (the second and third ports of that 4 port setup on my motherboard). There is an identical drive connected at ata5, but I've only once or twice seen it hit that drive.
Googling around lkml.org, I found a few threads investigating what look like very similar problems, some of which never seemed to find the solution, but one of which came up with a fairly quick answer it seemed, namely that the drive's NCQ implementation was horked: http://lkml.org/lkml/2007/4/18/32
While I don't have older logs to verify exactly when this started, it was fairly recent, perhaps around my 2.6.20.1 to 2.6.21.1 kernel upgrade.
Any other info or tests I can provide/run to help?
Syslog snippet:
Jun 21 10:35:23 cheetah kernel: ata6: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
Jun 21 10:35:24 cheetah kernel: ata6: CPB 0: ctl_flags 0x9, resp_flags 0x0
Jun 21 10:35:24 cheetah kernel: ata6: timeout waiting for ADMA IDLE, stat=0x400
Jun 21 10:35:24 cheetah kernel: ata6: timeout waiting for ADMA LEGACY, stat=0x400
Jun 21 10:35:24 cheetah kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jun 21 10:35:24 cheetah kernel: ata6.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
Jun 21 10:35:24 cheetah kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 21 10:35:24 cheetah kernel: ata6: soft resetting port
Jun 21 10:35:24 cheetah kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 21 10:35:24 cheetah kernel: ata6.00: configured for UDMA/133
Jun 21 10:35:24 cheetah kernel: ata6: EH complete
Jun 21 10:35:24 cheetah kernel: SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
Jun 21 10:35:24 cheetah kernel: sdb: Write Protect is off
Jun 21 10:35:24 cheetah kernel: sdb: Mode Sense: 00 3a 00 00
Jun 21 10:35:24 cheetah kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA