Re: [PATCH] fork: report pid reservation failure properly

From: Michal Hocko
Date: Mon Feb 23 2015 - 15:17:10 EST


ping on this one? Should I just resend (your way Andrew)? Or are there
any objections to the patch as is.

On Wed 04-02-15 11:43:48, Michal Hocko wrote:
> copy_process will report any failure in alloc_pid as ENOMEM currently
> which is misleading because the pid allocation might fail not only when
> the memory is short but also when the pid space is consumed already.
>
> The current man page even mentions this case:
> "
> EAGAIN
>
> A system-imposed limit on the number of threads was encountered.
> There are a number of limits that may trigger this error: the
> RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which
> limits the number of processes and threads for a real user ID, was
> reached; the kernel's system-wide limit on the number of processes
> and threads, /proc/sys/kernel/threads-max, was reached (see
> proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max,
> was reached (see proc(5)).
> "
>
> so the current behavior is also incorrect wrt. documentation. POSIX man
> page also suggest returing EAGAIN when the process count limit is
> reached.
>
> This patch simply propagates error code from alloc_pid and makes sure we
> return -EAGAIN due to reservation failure. This will make behavior of
> fork closer to both our documentation and POSIX.
>
> alloc_pid might alsoo fail when the reaper in the pid namespace is dead
> (the namespace basically disallows all new processes) and there is no
> good error code which would match documented ones. We have traditionally
> returned ENOMEM for this case which is misleading as well but as per
> Eric W. Biederman this behavior is documented in man pid_namespaces(7)
> "
> If the "init" process of a PID namespace terminates, the kernel
> terminates all of the processes in the namespace via a SIGKILL signal.
> This behavior reflects the fact that the "init" process is essential for
> the correct operation of a PID namespace. In this case, a subsequent
> fork(2) into this PID namespace will fail with the error ENOMEM; it is
> not possible to create a new processes in a PID namespace whose "init"
> process has terminated.
> "
> and introducing a new error code would be too risky so let's stick to
> ENOMEM for this case.
>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
> ---
> kernel/fork.c | 5 +++--
> kernel/pid.c | 15 ++++++++-------
> 2 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 3ce8d57cff09..c37b88a162c5 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1405,10 +1405,11 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> goto bad_fork_cleanup_io;
>
> if (pid != &init_struct_pid) {
> - retval = -ENOMEM;
> pid = alloc_pid(p->nsproxy->pid_ns_for_children);
> - if (!pid)
> + if (IS_ERR(pid)) {
> + retval = PTR_ERR(pid);
> goto bad_fork_cleanup_io;
> + }
> }
>
> p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 82430c858d69..bbb805ccd893 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -179,7 +179,7 @@ static int alloc_pidmap(struct pid_namespace *pid_ns)
> spin_unlock_irq(&pidmap_lock);
> kfree(page);
> if (unlikely(!map->page))
> - break;
> + return -ENOMEM;
> }
> if (likely(atomic_read(&map->nr_free))) {
> for ( ; ; ) {
> @@ -207,7 +207,7 @@ static int alloc_pidmap(struct pid_namespace *pid_ns)
> }
> pid = mk_pid(pid_ns, map, offset);
> }
> - return -1;
> + return -EAGAIN;
> }
>
> int next_pidmap(struct pid_namespace *pid_ns, unsigned int last)
> @@ -298,17 +298,20 @@ struct pid *alloc_pid(struct pid_namespace *ns)
> int i, nr;
> struct pid_namespace *tmp;
> struct upid *upid;
> + int retval = -ENOMEM;
>
> pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
> if (!pid)
> - goto out;
> + return ERR_PTR(retval);
>
> tmp = ns;
> pid->level = ns->level;
> for (i = ns->level; i >= 0; i--) {
> nr = alloc_pidmap(tmp);
> - if (nr < 0)
> + if (IS_ERR_VALUE(nr)) {
> + retval = nr;
> goto out_free;
> + }
>
> pid->numbers[i].nr = nr;
> pid->numbers[i].ns = tmp;
> @@ -336,7 +339,6 @@ struct pid *alloc_pid(struct pid_namespace *ns)
> }
> spin_unlock_irq(&pidmap_lock);
>
> -out:
> return pid;
>
> out_unlock:
> @@ -348,8 +350,7 @@ out_free:
> free_pidmap(pid->numbers + i);
>
> kmem_cache_free(ns->pid_cachep, pid);
> - pid = NULL;
> - goto out;
> + return ERR_PTR(retval);
> }
>
> void disable_pid_allocation(struct pid_namespace *ns)
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/