Re: Strange system hangs

From: Krzysztof Oledzki
Date: Sun Dec 02 2007 - 10:10:21 EST




On Sat, 29 Sep 2007, Nick Piggin wrote:

On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote:
Hello,

I am experiencing weird system hangs. Once about 2-5 weeks system freezes
and stops accepting remote connections, so it is no longer possible to
connect to most important services: smtp (postfix), www (squid) or even
ssh. Such connection is accepted but then it hangs.

What is strange, that previously established ssh session is usable. It is
possible to work on such system until you do something stupid like "less
/var/log/all.log". Using strace I found that process blocks on:

Is this a regression? If so, what's the most recent kernel that didn't show
the problem?

The symptoms could be consistent with some place doing a
balance_dirty_pages while holding a lock that is required for IO, but I can't
see a smoking gun (you've got contention on i_mutex, but that should be
OK).

Can you see if there is any memory under writeback that isn't being
completed (sysrq+M), also a list the locks held after the hang might be
helpful (compile in lockdep and sysrq+D)

Is anything currently running? (sysrq+P and even a full sysrq+T task list
could be useful).

Are any IO errors occurring at all?

It seems that 2.6.23.x still fails but somehow different. I updated my bugreport at: http://bugzilla.kernel.org/show_bug.cgi?id=9182. There are new attachments with traces and an oops that happened while I was taking the debugging data.

Thank you.

Best regards,


Krzysztof Olędzki