Re: SATA problems

From: Tejun Heo
Date: Thu Jan 04 2007 - 00:05:54 EST


Pablo Sebastian Greco wrote:
> By crash I mean the whole system going down, having to reset the entire
> machine.
> I'm sending you 4 files:
> dmesg: current boot dmesg, just a boot, because no errors appeared after
> last crash, since the server is out of production right now (errors
> usually appear under heavy load, and this primarily a transparent proxy
> for about 1000 simultaneous users)
> lspci: the way you asked for it
> messages and messages.1: files where you can see old boots and crashes
> (even a soft lockup).
> If there is anything else I can do, let me know. If you need direct
> access to the server, I can arrange that too.

Can you try 2.6.20-rc3 and see if 'CLO not available' message goes away
(please post boot dmesg)?

The crash/lock is because filesystem code does not cope with IO errors
very well. I can't tell why timeouts are occurring in the first place.
It seems that only samsung drives are affected (sda2, 3, 4). Hmmm...
Please apply the attached patch to 2.6.20-rc3 and test it.

Thanks.

--
tejun
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 0d51d13..f8cf349 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3327,6 +3327,8 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
/* NCQ is slow */
{ "WDC WD740ADFD-00", NULL, ATA_HORKAGE_NONCQ },

+ { "SAMSUNG SP2504C", NULL, ATA_HORKAGE_NONCQ },
+
/* Devices with NCQ limits */

/* End Marker */