Re: Updated on UDMA BadCRC errors + subsequent problems

From: Ed Sweetman
Date: Fri Jan 16 2004 - 11:14:31 EST


John Bradford wrote:
Quote from Jonathan Kamens <jik@xxxxxxxxxxxxxxxxxxxxxx>:

John Bradford writes:
> Quote from Jonathan Kamens <jik@xxxxxxxxxxxxxxxxxxxxxx>:
> > ... hde: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
> > ... hde: drive_cmd: error=0x04 { DriveStatusError }
> > The drive doesn't seem to understand the command it was sent.

I'm not sure what this means, but assuming that it's going to happen
again at some point,


Maybe not - the most common cause I've seen for that message in the logs is trying to access S.M.A.R.T. information when S.M.A.R.T. is disabled.

I.E. the error should be reproducable with:

# smartctl -d /dev/hda
# smartctl -a /dev/hda

Are you sure you weren't trying to get S.M.A.R.T. info from the drive at the time the error was logged?

John.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }


Some drives i guess report the exact error like mine here. These occur when i'm transferring from my tulip based nic to my hdd at 8Mbytes a second (avg). The fs is ext3. When i'm transferring about 1.5GB files over the drive seems to freak out. Timing sources (tsc) can also lose so many ticks that the other time source has to be used.

What i dont understand is why the ata drivers dont handle crc errors correctly. Instead of resetting the ide bus and turning dma off why dont they start throttling down the dma modes one by one, When the rate of crc errors reaches a certain reasonable number, drop an udma level. If that crc error rate is reached again, drop a level. You keep doing that until you hit pio mode. Usually the problem is solved by simply using a lower dma mode. That way my system doesn't have to reach loads of 20 and io suck all my cpu while i'm trying to re-enable dma so i can actually figure out what's going on. CRC errors are caused by timing problems as well as physical problems around the cabling in the computer. Normal hdd to hdd transfers (which avg about 30MByte/sec) do not cause these errors for me.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/