Debugging process hanging in D status and not responding to SIGKILL

From: Christian Hammers
Date: Fri Dec 21 2007 - 11:07:50 EST


Hello

Occasionally all my apache2 processes hang in "D" process status where they are
no longer responsible to SIGKILL which makes the server almost un-rebootable.
The processes usually vanish after about 15-30min.

I know that this usually means "I/O wait" somewhere deep inside a kernel
function where signals are not handled.

But using "strace -p <pid>" I can only see that the last called function is
flock() (according to /proc/<pid>/fd somebody produced a deadlock when using
PHP sessions). But flock() normally terminates on SIGTERM and SIGKILL.

Could it be a problem with my SCSI driver who's involed in the flock process?
The problem usually occurs on two other webserver at the same time, one using
the same hardware and one being a Dell 6850 with Dell PERC RAID, though.

How can I get further debugging information that could give me a hint?

The system:
Kernel: 2.6.22-3-amd64 (modified by Debian distribution)
CPU: Quad Dualcore Intel(R) Xeon(R) E5310 @ 1.60GHz, 8GB RAM
M/B: Transtec Server with Intel 5000 Series Chipset
SCSI: 3ware Inc 9550SX SATA-RAID with firmware FE9X 3.08.00.016

bye,

-christian-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/