Re: Warning - Maxtor SATA II and Nvidia nforce4

From: Dax Kelson
Date: Wed Mar 15 2006 - 18:20:11 EST


On Wed, 2006-03-15 at 17:47 -0500, Jeff Garzik wrote:
> Ah, I see this made it to LKML :)
>
> Dax Kelson wrote:
> > Short version
> > ==============
> > Nvidia Nforce4 chipset with Maxtor SATA II drives with certain firmware
> > revisions cause data corruption and system instability when under
> > moderate to heavy I/O load.
>
> I'm a bit suspicious of this.
>
> Looking at the link, there are three problem areas and two problem blame
> targets implied:
>
> Data corruption -> blame nvidia driver
> NCQ -> blame nvidia driver
> Detection -> blame maxtor firmware
>
> The first one likely applies to the Windows driver not Linux's sata_nv,
> and thus irrelevant here.

No.

Take a big file (5-10gb)

$ cp bigfile newfile
$ cp bigfile newfile2
$ cp bigfile newfile3
$ cp bigfile newfile4
$ md5sum bigfile newfile*
[results are all different, assuming kernel doesn't panic during test]

When I use the "stress" utility from
http://weather.ou.edu/~apw/projects/stress/

The box usually makes it an an hour or two before a kernel panic or I/O
errors wedge the box.

I setup a netdump/netconsole server on my network and I have several
crashes captured. If you are interested I can send them on to you. I
filed most them under the Red Hat bugzilla, but closed them after I
discovered they were a hardware problem.

> The second one OBVIOUSLY applies only to
> Windows, since sata_nv (and libata itself) don't yet enable NCQ. The
> third one could potentially apply to Linux. Lastly, your mention of
> "nforce fake raid" almost certainly indicates Windows or proprietary
> drivers.

Linux device mapper is proprietary? :)

The corruption occurs with a single disk or when using a device mapper
"nvraid".

> Therefore, I ask:
> * are you reporting a only drive detection problem?

No. Detection was never a problem for me.

> * why are you reporting unrelated Windows problems to a Linux list?

I'm not, see above.

> * if you are indeed reporting a problem on Linux, where is the kernel
> and driver version info, as requested in REPORTING-BUGS?

Well, what can Linux do about this hardware problem? Maybe there is a
workaround that can be done, but I'm not counting on it. A warning would
be nice if it possible to detect the conditions where this can occur.
This way others can troubleshoot and identify this problem quicker.

I used mostly late model FC5 rawhide kernels which I believe are based
off of 2.6.16rc5-git12/git13 or therebouts.

> * and can you provide such info *and reproduce the problems* without
> proprietary drivers loaded?

Sorry for the misunderstanding. Again, no proprietary drivers ever
loaded. Problem is 100% reproducible. See above, etc.

Dax Kelson


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/