Re: Process hanging in D state (uninterruptable sleep)

Gadi Oxman (gadio@netvision.net.il)
Tue, 3 Mar 1998 18:38:48 +0300 (IST)


On Tue, 3 Mar 1998, Holger Kiehl wrote:

> Hello!
>
> I have been using Raid 5 with 2.0.x without any problems and now decided
> to use a 2.1.x kernel. All libs have been upgraded as stated in the Changes
> file. With the 2.1.x kernel I have noticed the following problems:
>
> - When running my benchmark and it gets to the disk test, it will
> hangup at some random point. That process will then hang in D
> state and there is no way for me to remove it. This happens
> every time I run the benchmark. When I now try to sync the file
> system, that command (sync) also hangs up in the D state and I am
> unable to unmount the file system (the umount command also hangs in
> D state).

I'm afraid that the RAID-5 code might not be SMP safe yet (2.0.x contains a
global SMP kernel lock, and we designed the code with UP in mind). Does it
also happen when compiling the kernel with the "SMP = 1" line commented out?

Another approach which we might try is adding a serializing "cpuid"
instruction on each "wait_on_*()" type of function per Linus's suggestion
on similar type of problems.

I have appended a patch which was recently sent to linux-kernel and have
added the "cpuid" serializing instruction to wait_on_page() in mm/filemap.c
and to wait_on_stripe() in drivers/block/raid5.c.

Does it make a difference on your system?

> - With a 2.0.x kernel parity was reconstructed when a raid array was
> not clean. Kernel 2.1.x does not do this, it just tells me to use
> ckraid. Is there something wrong with my setup?

We disabled the automatic reconstruction by default in the 2.1.x
kernels since we felt that it still requires a bit more work -- it can
be enabled manually by changing the "#define SUPPORT_RECONSTRUCTION 0"
line in include/linux/md.h to "#define SUPPORT_RECONSTRUCTION 1".

Gadi

> My hardware/software configuration is as follows:
>
> Asus P2L97-DS with 2 PII-233
> Onboard Adaptec 7880 Ultra-Wide with 3 QUANTUM VIKING 2.3 WSE
>
> Kernel is 2.1.88 with SMP enabled
> Distribution is SuSE 5.1
>
> Raid setup is as follows:
> md0 : active raid5 sda6 sdb6 sdc6 2104320 blocks level 5,
> 32k chunk, algorithm 2 [3/3] [UUU]
> md1 : active raid5 sda7 sdb7 sdc7 2056064 blocks level 5,
> 32k chunk, algorithm 2 [3/3] [UUU]
>
> Holger

diff -r -c linux-2.1.88.orig/fs/buffer.c linux/fs/buffer.c
*** linux-2.1.88.orig/fs/buffer.c Thu Feb 19 04:09:47 1998
--- linux/fs/buffer.c Sat Feb 28 01:00:52 1998
***************
*** 138,143 ****
--- 138,144 ----
add_wait_queue(&bh->b_wait, &wait);
repeat:
tsk->state = TASK_UNINTERRUPTIBLE;
+ __asm__ __volatile__("cpuid": : :"ax", "bx", "cx", "dx", "memory");
run_task_queue(&tq_disk);
if (buffer_locked(bh)) {
schedule();
diff -r -c linux-2.1.88.orig/fs/dquot.c linux/fs/dquot.c
*** linux-2.1.88.orig/fs/dquot.c Mon Jan 12 17:46:24 1998
--- linux/fs/dquot.c Sun Mar 1 08:33:37 1998
***************
*** 173,178 ****
--- 173,179 ----
add_wait_queue(&dquot->dq_wait, &wait);
repeat:
current->state = TASK_UNINTERRUPTIBLE;
+ __asm__ __volatile__("cpuid": : :"ax", "bx", "cx", "dx", "memory");
if (dquot->dq_flags & DQ_LOCKED) {
dquot->dq_flags |= DQ_WANT;
schedule();
diff -r -c linux-2.1.88.orig/fs/inode.c linux/fs/inode.c
*** linux-2.1.88.orig/fs/inode.c Mon Feb 9 14:34:17 1998
--- linux/fs/inode.c Sat Feb 28 01:03:06 1998
***************
*** 103,108 ****
--- 103,109 ----
add_wait_queue(&inode->i_wait, &wait);
repeat:
current->state = TASK_UNINTERRUPTIBLE;
+ __asm__ __volatile__("cpuid": : :"ax", "bx", "cx", "dx", "memory");
if (inode->i_state & I_LOCK) {
schedule();
goto repeat;
diff -r -c linux-2.1.88.orig/fs/super.c linux/fs/super.c
*** linux-2.1.88.orig/fs/super.c Fri Jan 30 19:43:11 1998
--- linux/fs/super.c Sun Mar 1 08:34:08 1998
***************
*** 422,427 ****
--- 422,428 ----
add_wait_queue(&sb->s_wait, &wait);
repeat:
current->state = TASK_UNINTERRUPTIBLE;
+ __asm__ __volatile__("cpuid": : :"ax", "bx", "cx", "dx", "memory");
if (sb->s_lock) {
schedule();
goto repeat;
--- linux/mm/filemap.c~ Tue Mar 3 18:25:30 1998
+++ linux/mm/filemap.c Tue Mar 3 18:25:30 1998
@@ -312,6 +312,7 @@
add_wait_queue(&page->wait, &wait);
repeat:
tsk->state = TASK_UNINTERRUPTIBLE;
+ __asm__ __volatile__("cpuid": : :"ax", "bx", "cx", "dx", "memory");
run_task_queue(&tq_disk);
if (PageLocked(page)) {
schedule();
--- linux/drivers/block/raid5.c~ Tue Mar 3 18:25:47 1998
+++ linux/drivers/block/raid5.c Tue Mar 3 18:25:47 1998
@@ -97,6 +97,7 @@
add_wait_queue(&sh->wait, &wait);
repeat:
current->state = TASK_UNINTERRUPTIBLE;
+ __asm__ __volatile__("cpuid": : :"ax", "bx", "cx", "dx", "memory");
if (stripe_locked(sh)) {
schedule();
goto repeat;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu