Re: sata_nv issues with MCP51 SATA controller

From: auxsvr
Date: Fri Sep 14 2007 - 16:28:52 EST


Hello,

I get a similar, if not identical, problem with an ASUS A8N SLI nforce4 based
motherboard. The PC (with a seagate SATA-2 120 GB HDD) ran fine for two
years , last Christmas windows xp (I didn't change either hardware or
drivers) started crashing and the filesystem got corrupted beyond repair
within 8 hours after every installation. The system log contained entries
about bad sectors and, based on the seagate diagnosis tool, I returned the
system to the supplier. According to the retail shop, neither the disk nor
the system had any problems, so I was coerced to pay for a replacement disk.
The replacement HDD (seagate again, 120 GB) ran fine until a month ago (this
time the system is connected to a UPS), when the same problem occurred! I
moved the disk to a linux system with the promise tx2plus controller (the one
I'm typing this from), found bad sectors, formatted it and everything works
fine for at least 6 hours of continuous disk writes and reads in this system.
If I return the disk to the nforce4 system, it becomes corrupted within some
hours of disk access, no matter whether linux or windows is installed,
regardless of NCQ settings, drivers and cables.

The symptoms are the same in both cases: the system crashes, then runs for
some hours, then the controller stops completely responding (ata1: exception
Emask 0x10 SAct 0x0 SErr 0x1810000 action 0x2 frozen is the first error
message), the disk access LED blinks continuously, linux 2.6.18 (opensuse
10.2) throws lots of error messages similar to the ones you mention above,
linux says that the device is dead and the system becomes unusable (no disk
access). After a reboot, the filesystem is fine for some time, afterwards
similar error messages appear, seek errors appear and the filesystem becomes
completely destroyed. The positive part of this ordeal is that the linux SATA
error handling works fine and linux recovered the first time, without access
to the drive of course, while windows crashed badly and I was unable to find
out what was happening in the beginning.

I cannot say with certainty that this is a hardware error or damage, seagate
technical support insists that their HDD is at fault, which is obviously
wrong, the PC is (after the second incident) connected to a UPS and was
checked by the service at the shop, and the most weird thing I cannot
explain is that the system ran fine for 8 months after I changed the
disk, even though the disk wasn't damaged! Either the motherboard is damaged
or faulty (how can you explain that it ran fine for 8 months after I changed
the disk?) or there is some very weird interaction with the HDD and the SATA
controller, which isn't unlikely, considering the problems reported about
combinations of nforce4 and maxtor HDDs, yet still doesn't explain the 2 year
and 8 month period of normal operation. I'm going to contact the service
again and see how this comes out.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/