Re: [TEST PATCH] pid: fix allocating pid 2 for init (was Re: proc_flush_task oops)
From: Eric W. Biederman
Date: Sat Dec 23 2017 - 22:13:09 EST
Alexey Dobriyan <adobriyan@xxxxxxxxx> writes:
> On Fri, Dec 22, 2017 at 08:41:54AM -0600, Eric W. Biederman wrote:
>> Alexey Dobriyan <adobriyan@xxxxxxxxx> writes:
>
>> > unshare
>> > fork
>> > alloc_pid in level 1 succeeds
>> > alloc_pid in level 0 fails, ->idr_next is 2
>> > fork
>> > alloc pid 2
>> > exit
>> >
>> > Reliable reproducer and fail injection patch attached
>> >
>> > I'd say proper fix is allocating pids in the opposite order
>> > so that failure in the last layer doesn't move IDR cursor
>> > in baby child pidns.
>>
>> I agree with you about changing the order. That will make
>> the code simpler and in the second loop actually conforming C code,
>> and fix the immediate problem.
>
> Something like that (barely tested)
I have thought about this some more and I think we can do better.
I don't like the on stack pid_ns array.
The only reason the code calls disable_pid_allocation is that we
don't handle this error.
The semantics of least surprise, are that if we run out of resources
while allocating something, trying again when more resources are
available will make it work.
So it looks like handling the error will improve the quality of the
implemenation, and be a simpler, less dangerous patch.
diff --git a/kernel/pid.c b/kernel/pid.c
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -226,6 +224,10 @@ struct pid *alloc_pid(struct pid_namespace *ns)
while (++i <= ns->level)
idr_remove(&ns->idr, (pid->numbers + i)->nr);
+ /* On failure to allocate the first pid, reset the state */
+ if (ns->pid_allocated == PIDNS_ADDING)
+ idr_set_cursor(&ns->idr, 0);
+
spin_unlock_irq(&pidmap_lock);
kmem_cache_free(ns->pid_cachep, pid);
Eric