I have an Athlon64 machine running kernel 2.6.10-rc3 (but this problem has happened on 2.6.9-ac7 as well) with a VIA VT6420 SATA controller. Every few days (the problem is not chronologically consistent) and/or when there's heavy disk usage, the main SATA disk (a Western Digital model WDC WD1200JD-00G) will just completely stop responding to any I/O, and a lot of SCSI error messages will be output to the console. After a few instances of this happening (which requires a hard power-off, then power-on.. just hitting the reset button causes the SATA controller not to recognize the drive on boot), I finally managed to capture some of the kernel messages, since somehow I could still read one of my log files (cached in memory, I guess). The same set of errors just keep repeating over and over. I also believe there was an ext3 error that showed up on the console and not in the log, but I assume this is not an ext3 problem anyway. The partial log file and the output of lspci -vvv are attached. I have no idea whether this is a software or hardware problem. Running Western Digital's diagnostics on the drive turned up no errors. If anyone has seen this problem before and it turned out to be hardware-related, I'd like to find out exactly which component is the culprit.
Thanks in advance,