Re: [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the->child_reaper changing

From: Serge E. Hallyn
Date: Tue Aug 26 2008 - 17:28:35 EST


Quoting Oleg Nesterov (oleg@xxxxxxxxxx):
> zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
>
> Yes, we have already killed all tasks in this namespace, and sys_wait4()
> doesn't see any child. But this doesn't mean ->children list is empty,
> we may have EXIT_DEAD tasks which are not visible to do_wait(). In that
> case the subsequent forget_original_parent() will crash the kernel because
> it will try to re-parent these tasks to the NULL reaper.
>
> Even if there are no childs, it is not good that forget_original_parent()
> uses reaper == NULL.
>
> Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
> We could use pid_ns->parent->child_reaper as well, I think this does not
> really matter. These EXIT_DEAD tasks are not visible to the new ->parent
> after re-parenting, they will silently do release_task() eventually.
>
> Note that we must change ->child_reaper, otherwise forget_original_parent()
> will use reaper == father, and in that case we will hit the (correct)
> BUG_ON(!list_empty(&father->children)).
>
> Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>

Well while it looked correct to me all along, I couldn't get the
testcase to cause an oops. But I see where do_exit() calls
zap_pid_ns_processes() (through group_exit()) before
forget_original_parent(), which can/should end up dereferencing
the null pid_ns, so clearly this is needed and correct.

Acked-by: Serge Hallyn <serue@xxxxxxxxxx>

Thanks, Oleg.

-serge

> --- 2.6.27-rc4/kernel/pid_namespace.c~1_ZAP_DONT_CLEAR_REAPER 2008-07-30 13:12:49.000000000 +0400
> +++ 2.6.27-rc4/kernel/pid_namespace.c 2008-08-24 17:22:59.000000000 +0400
> @@ -179,9 +179,12 @@ void zap_pid_ns_processes(struct pid_nam
> rc = sys_wait4(-1, NULL, __WALL, NULL);
> } while (rc != -ECHILD);
>
> -
> - /* Child reaper for the pid namespace is going away */
> - pid_ns->child_reaper = NULL;
> + /*
> + * We can not clear ->child_reaper or leave it alone.
> + * There may by stealth EXIT_DEAD tasks on ->children,
> + * forget_original_parent() must move them somewhere.

Actually this comment is a little bit misleading - the null
deref will happen regardless of whether there are children,
right?

> + */
> + pid_ns->child_reaper = init_pid_ns.child_reaper;
> acct_exit_ns(pid_ns);
> return;
> }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/