Re: Strange system hangs

From: Nick Piggin
Date: Sat Sep 29 2007 - 08:45:59 EST


On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote:
> Hello,
>
> I am experiencing weird system hangs. Once about 2-5 weeks system freezes
> and stops accepting remote connections, so it is no longer possible to
> connect to most important services: smtp (postfix), www (squid) or even
> ssh. Such connection is accepted but then it hangs.
>
> What is strange, that previously established ssh session is usable. It is
> possible to work on such system until you do something stupid like "less
> /var/log/all.log". Using strace I found that process blocks on:

Is this a regression? If so, what's the most recent kernel that didn't show
the problem?

The symptoms could be consistent with some place doing a
balance_dirty_pages while holding a lock that is required for IO, but I can't
see a smoking gun (you've got contention on i_mutex, but that should be
OK).

Can you see if there is any memory under writeback that isn't being
completed (sysrq+M), also a list the locks held after the hang might be
helpful (compile in lockdep and sysrq+D)

Is anything currently running? (sysrq+P and even a full sysrq+T task list
could be useful).

Are any IO errors occurring at all?

Thanks,
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/