[PATCH] pid_ns: Allow to get pid_for_children ns before child_reaper is created
From: Kirill Tkhai
Date: Tue May 23 2017 - 12:30:00 EST
This patch prohibits pid allocation till child_reaper
of pid namespace is set, and it makes possible and safe
to get just unshared pid_ns from "/proc/[pid]/ns/pid_for_children"
file. This may be useful to determine user_ns of such a created
pid_ns, which is not possible now.
It was prohibited till now, because the architecture of pid namespaces
assumes child reaper is the firstly created process of the namespace,
and it initializes pid_namespace::proc_mnt. Child reaper creation
mustn't race with creation of another processes from this namespace,
otherwise a process with pid > 1 may die before pid_namespace::proc_mnt
is populated and it will get a null pointer dereference in proc_flush_task().
Also, child reaper mustn't die before processes from the namespace.
The patch prevents such races. It allows to alloc_pid() only if
ns->child_reaper is already set, and it guarantees, that
pid_namespace::proc_mnt is already populated. Also, we do the assignment
under the tasklist_lock in the copy_process() stage, when it can't fail.
This guarantees the child_reaper will be hashed before the concurrent
process, so the concurrent process can't die before it. When child reaper
dies before the concurrent hashes to task list, fork() of the concurrent
will aborts as it's prohibited after commit 3fd372262166:
"pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()".
So, we can't safely allow to get "/proc/[pid]/ns/pid_for_children"
since it's created.
Signed-off-by: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>
CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
CC: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
CC: Oleg Nesterov <oleg@xxxxxxxxxx>
CC: Andy Lutomirski <luto@xxxxxxxxxx>
CC: Serge Hallyn <serge@xxxxxxxxxx>
CC: Michal Hocko <mhocko@xxxxxxxx>
CC: Andrei Vagin <avagin@xxxxxxxxxx>
CC: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
CC: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx>
CC: Ingo Molnar <mingo@xxxxxxxxxx>
CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
---
kernel/pid.c | 3 ++-
kernel/pid_namespace.c | 9 ---------
2 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/kernel/pid.c b/kernel/pid.c
index fd1cde1e4576..eeeb01fdd87c 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -334,7 +334,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
upid = pid->numbers + ns->level;
spin_lock_irq(&pidmap_lock);
- if (!(ns->nr_hashed & PIDNS_HASH_ADDING))
+ if (!(ns->nr_hashed & PIDNS_HASH_ADDING) ||
+ (!ns->child_reaper && !is_child_reaper(pid)))
goto out_unlock;
for ( ; upid >= pid->numbers; --upid) {
hlist_add_head_rcu(&upid->pid_chain,
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 74a5a7255b4d..51dd1d490542 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -385,15 +385,6 @@ static struct ns_common *pidns_for_children_get(struct task_struct *task)
}
task_unlock(task);
- if (ns) {
- read_lock(&tasklist_lock);
- if (!ns->child_reaper) {
- put_pid_ns(ns);
- ns = NULL;
- }
- read_unlock(&tasklist_lock);
- }
-
return ns ? &ns->ns : NULL;
}