Re: uninterruptible sleep lockups

From: linux-os
Date: Tue Feb 22 2005 - 18:44:56 EST


On Tue, 22 Feb 2005, Chris Friesen wrote:

linux-os wrote:

Now, somebody needs a resource. It executes down(&semaphore);
once it gets control again, it has that resource. It attempts
to use that resource through a driver. The driver waits forever.
The resource is now permanently dorked --forever because its
driver is waiting forever. The user code never returns from
the driver so it can never execute up(&semaphore).

What about something like a "robust mutex" (in OSDL terminology)? The guy holding it too long gets killed, and the mutex gets marked as dirty. The next guy to aquire the mutex is responsable for re-initializing the resource (resetting the device to a known state, for instance).

Chris


All wonderful. However, it dosn't fix the problem. You are,
again, assuming that the problem is the symptom! The problem
is that some piece of code is not handling an exception
properly. It is waiting forever for something that will
never happen. It's that CODE that needs to be fixed.

"Cleaning" up the immediate symptoms doesn't let
the next thread that acquires the "cleaned up" lock
use the hardware because it has jammed code between
that thread and the hardware.

The bad code needs to be fixed. If the bad code is
fixed, you will __never__ have a process stuck
in 'D' state unless you run for the 1000 years
that could statistically result in a bit in
the semaphore getting flipped.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/