Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

From: Ingo Molnar
Date: Sun Dec 02 2007 - 15:26:33 EST



* Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:

> > .. and it's even a tool to show where we missed making something
> > TASK_KILLABLE... anything that triggers from NFS and the like really
> > ought to be TASK_KILLABLE after all. This patch will point any
> > omissions out quite nicely without having to do any kind of
> > destructive testing.
>
> It would be better to just audit the source for those. [...]

that was wishful thinking 10 years ago already, when Linux was 10 times
smaller.

> [...] Outlawing something which was previously legal without auditing
> the source is bad.

to the contrary, being 120+ seconds uninterruptible without a very good
reason is certainly something that was unreasonable (and harmful) for a
long time already - we just never had the mechanism to warn about this
intelligently without crashing the system.

Out of direct experience, 95% of the "too long delay" cases are plain
old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could
annotate if it _really_ needs to be TASK_UNINTERRUPTIBLE.

> Anyways, i suspect it would just lead to more people disabling
> softlockup. I remember during some older stress testing it also tended
> to explode regularly, so e.g. SUSE kernel rpms have it disabled. That
> patch would probably make it worse.

There are no softlockup false positive bugs open at the moment. If you
know about any, then please do not hesitate and report them, i'll be
eager to fix them. The softlockup detector is turned on by default in
Fedora (alongside lockdep in rawhide), and it helped us find countless
number of bugs. You are the first person to suggest that it's somehow
harmful.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/