Re: do_wait fix for 2.6.10-rc1

From: Klaus Dittrich
Date: Mon Nov 29 2004 - 06:57:28 EST


Sripathi Kodi wrote:

Can you test this patch please?

--
Regards Klaus


Hi Klaus,

I have tested your patch against my testcase and it seems to be alright.

My testcase is right here in my first post on LKML about this: http://lkml.org/lkml/2004/11/5/34! The essence of the testcase is two threads racing in waitpid to collect the exit status of a child. Sometimes the thread that loses this race gets stuck in waitpid forever unless manually interrupted. I had been able to recreate this problem only on a multiprocessor machine. The essence of the fix was to make the thread that loses this race not to set 'flag =1'. The patch you sent me does not alter this behavior, so it works alright.

Thanks and regards,
Sripathi.

I encountered a bug in waitpid after Sripathi's patch
was applied.
See lkml
"kernel reports kill too late? Thu Nov 18 2004 - 07:36:48 EST"

Attached is a patch for do_wait() that may fix both problems.

--
Regards Klaus
--- kernel/exit.ORIG.c 2004-11-24 17:20:32.000000000 +0100
+++ kernel/exit.c 2004-11-26 17:38:36.000000000 +0100
@@ -1316,6 +1316,9 @@ static long do_wait(pid_t pid, int optio
DECLARE_WAITQUEUE(wait, current);
struct task_struct *tsk;
int flag, retval;
+ struct task_struct *p;
+ struct list_head *_p;
+ int ret;

add_wait_queue(&current->wait_chldexit,&wait);
repeat:
@@ -1324,9 +1327,6 @@ repeat:
read_lock(&tasklist_lock);
tsk = current;
do {
- struct task_struct *p;
- struct list_head *_p;
- int ret;

list_for_each(_p,&tsk->children) {
p = list_entry(_p,struct task_struct,sibling);
@@ -1377,10 +1377,11 @@ repeat:
goto end;
break;
}
- flag = 1;
check_continued:
- if (!unlikely(options & WCONTINUED))
+ if (!unlikely(options & WCONTINUED)) {
+ flag = 1;
continue;
+ }
retval = wait_task_continued(
p, (options & WNOWAIT),
infop, stat_addr, ru);