Re: WD IDNF error

From: Justin Piszcz
Date: Mon Jan 26 2009 - 07:15:50 EST




On Mon, 26 Jan 2009, Anton Petrusevich wrote:

Hello,


Google shows me you had a similar problem. I have a WDC WD3000HLFS-01G6U0 in
one my server and this drive is used for oracle database, and it's already
second time when I get error like this:

Jan 26 11:13:31 rc03 kernel: [5131931.511985] ata2.00: status: { DRDY ERR }
Jan 26 11:13:31 rc03 kernel: [5131931.511985] ata2.00: error: { IDNF }

Several months ago when it happend first time I extensively checked the drive
after reboot and it worked flawlessly, but database was corrupted. This time
I just rebooted the server and overwrote one control file and everything goes
normal. This error is actually bothers me as I don't actually like to restore
database from backup which can be sometimes a little bit outdated.

So, have you find a solution? My kernel is 2.6.26-1-amd64, I read about NCQ,
I see:
Jan 26 14:24:16 rc03 kernel: [ 3.759922] ata2.00: ATA-8: WDC
WD3000HLFS-01G6U0, 04.04V01, max UDMA/133
Jan 26 14:24:16 rc03 kernel: [ 3.759995] ata2.00: 586072368 sectors, multi
16: LBA48 NCQ (depth 0/32)

So, NCQ is already disabled.
--
Anton Petrusevich


Please see:
http://forums.storagereview.net/index.php?showtopic=27303&hl=premature

Other users are also reporting this, the fix for me was to replace my drives with WD RE3s. After 9-10 RMAs and no fix, it is some bug in the kernel or drive firmware, after 6-12mos of RMA and corrupted files, broken raids and weeks of lost time dealing with this problem, I removed all my WD Velociraptors from production, replaced with WD RE3 and have not had a problem since.

Turning off NCQ fixes the problem with Raptor's and Velociraptors too, where NCQ doesn't work well with Linux for these particular drives; however, the DRDY ERR and IDNF errors-- there's something wrong with the Velociraptors in Linux, I'd suggest subscribing to that forum and adding your notes there, that is where I have been tracking the issue primarily.

I have cc'd linux-kernel and linux-ide here for further suggestions, unfortunately however, I do not know of anyone who works at WD who participates on these lists so I am not sure how to get the problem escalated. When I tried to open cases with WD about the problem, their solution was (again) to replace all the HDDs, etc. As far as I am concerned the WD Velociraptor is defective, at least in Linux (both GLFS/HLFS) versions that I used and the 9-10 RMAs as well, I think there needs to be more bug reports etc and tracking on this issue, otherwise, it will probably never be solved. However, if you have important data, I'd recommend not storing it on a VR unless you have it backed up elsewhere, and verify the backups regularly.

Justin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/