Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

From: David Rientjes
Date: Sun Dec 02 2007 - 10:55:48 EST


On Sun, 2 Dec 2007, Ingo Oeser wrote:

> > maybe, but we'd have to see how often this gets triggered. An OOM is
> > something that could happen in any overloaded system - while a hung task
> > is likely due to a kernel bug.
>
> What about a client using hard mounted NFS shares here? That shouldn't be
> killed by the OOM killer in that situation, should it?
>

That's orthogonal to the point I was making; the problem with the OOM
killer right now is that it can easily enter an infinite loop in out of
memory conditions if the task that it has selected to be killed fails to
exit. This only happens when the task hangs in TASK_UNINTERRUPTIBLE state
and doesn't respond to the SIGKILL that the OOM killer has sent it.

That behavior is a consequence of trying to avoid needlessly killing tasks
by giving already-killed tasks time to exit in subsequent OOM conditions.
During the tasklist scan of eligible tasks to kill, if any task is found
to have access to memory reserves that only the OOM killer can provide
(signified by the TIF_MEMDIE thread flag) and it has not yet died, the OOM
killer becomes a complete no-op.

This happens on occasion and completely deadlocks the system because the
out of memory condition will never be alleviated. With the hang detection
addition to lockdep, it would be easy to correct this situation. I
understand the primary purpose of the patch is to identify potential
kernel bugs that aren't hardware induced, but I think it has relevance to
the OOM killer problem until such time as tasks hanging in
TASK_UNINTERRUPTIBLE state becomes passe.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/