Re: [PATCH] do_wait fix for 2.6.10-rc1

From: Linus Torvalds
Date: Mon Nov 08 2004 - 13:14:19 EST




On Mon, 8 Nov 2004, Linus Torvalds wrote:
>
> In fact, it looks like the "bail_ref" case has _another_ bug: since it
> returns zero, we may still be _using_ the task pointer (to get to the next
> task), so we must NOT do a "put_task_struct()" intil we've re-acquired the
> tasklist_lock.

Actually, this part of the patch is bogus. If our "put_task_struct" is the
last one, it doesn't help at all that we're insidfe the tasklist_lock.
We'll just release the task structure ourselves.

The problem remains, though: "do_wait()" does end up accessing "tsk" in

tsk = next_thread(tsk);

and as far as I can see, "tsk" may be gone by then.

Is there anything else protecting us? This looks like a serious (if
extremely unlikely) bug..

Anyway, here's an updated patch for Sripathi's original problem, with a
better comment, and with the put_task_struct back where it was originally
since it shouldn't matter.

Help me, Obi-Wan McGrath,

Linus

----
===== kernel/exit.c 1.166 vs edited =====
--- 1.166/kernel/exit.c 2004-11-04 11:13:19 -08:00
+++ edited/kernel/exit.c 2004-11-08 08:08:18 -08:00
@@ -1202,6 +1202,18 @@
bail_ref:
put_task_struct(p);
read_lock(&tasklist_lock);
+
+ /*
+ * We are returning to the wait loop without having successfully
+ * removed the process, and a zero value. That means that the wait
+ * will continue - but somebody else might have reaped a child
+ * (maybe this one) that we already decided was eligible.
+ *
+ * As a result, we need to make sure that we don't wait for
+ * children forever - they might be all gone. So mark us
+ * running, and the "schedule()" in do_wait() is fine.
+ */
+ current->state = TASK_RUNNING;
return 0;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/