Tejun Heo wrote:After an uptime of 13:34 under heavy load and no errors, I'm pretty sure your patch is correct. Is there a way to backport this to 2.6.18.x?Pablo Sebastian Greco wrote:Here's boot dmesg with 2.6.20-rc3 + blacklist. And you are right about only affecting samsung drives, but since only those drives get all the heavy load, couldn't tell exactly.
By crash I mean the whole system going down, having to reset the entire
machine.
I'm sending you 4 files:
dmesg: current boot dmesg, just a boot, because no errors appeared after
last crash, since the server is out of production right now (errors
usually appear under heavy load, and this primarily a transparent proxy
for about 1000 simultaneous users)
lspci: the way you asked for it
messages and messages.1: files where you can see old boots and crashes
(even a soft lockup).
If there is anything else I can do, let me know. If you need direct
access to the server, I can arrange that too.
Can you try 2.6.20-rc3 and see if 'CLO not available' message goes away
(please post boot dmesg)?
The crash/lock is because filesystem code does not cope with IO errors
very well. I can't tell why timeouts are occurring in the first place.
It seems that only samsung drives are affected (sda2, 3, 4). Hmmm...
Please apply the attached patch to 2.6.20-rc3 and test it.
Thanks.
I'm putting the server in production right now, so I think in a few hours I'll have more info.
Thanks.
Pablo.