Re: SATA problems

From: Pablo Sebastian Greco
Date: Thu Jan 04 2007 - 08:18:34 EST


Tejun Heo wrote:
Pablo Sebastian Greco wrote:
By crash I mean the whole system going down, having to reset the entire
machine.
I'm sending you 4 files:
dmesg: current boot dmesg, just a boot, because no errors appeared after
last crash, since the server is out of production right now (errors
usually appear under heavy load, and this primarily a transparent proxy
for about 1000 simultaneous users)
lspci: the way you asked for it
messages and messages.1: files where you can see old boots and crashes
(even a soft lockup).
If there is anything else I can do, let me know. If you need direct
access to the server, I can arrange that too.

Can you try 2.6.20-rc3 and see if 'CLO not available' message goes away
(please post boot dmesg)?

The crash/lock is because filesystem code does not cope with IO errors
very well. I can't tell why timeouts are occurring in the first place.
It seems that only samsung drives are affected (sda2, 3, 4). Hmmm...
Please apply the attached patch to 2.6.20-rc3 and test it.

Thanks.

Here's boot dmesg with 2.6.20-rc3 + blacklist. And you are right about only affecting samsung drives, but since only those drives get all the heavy load, couldn't tell exactly.
I'm putting the server in production right now, so I think in a few hours I'll have more info.

Thanks.
Pablo.

Attachment: dmesg.bz2
Description: Binary data