Re: [PATCH 2/2] pid_ns: Introduce ioctl to set vector of ns_last_pid's on ns hierarhy

From: Eric W. Biederman
Date: Thu Apr 27 2017 - 12:46:18 EST


Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes:

> On 27.04.2017 19:12, Oleg Nesterov wrote:
>> On 04/26, Kirill Tkhai wrote:
>>>
>>> On 26.04.2017 18:53, Oleg Nesterov wrote:
>>>>
>>>>> +static long set_last_pid_vec(struct pid_namespace *pid_ns,
>>>>> + struct pidns_ioc_req *req)
>>>>> +{
>>>>> + char *str, *p;
>>>>> + int ret = 0;
>>>>> + pid_t pid;
>>>>> +
>>>>> + read_lock(&tasklist_lock);
>>>>> + if (!pid_ns->child_reaper)
>>>>> + ret = -EINVAL;
>>>>> + read_unlock(&tasklist_lock);
>>>>> + if (ret)
>>>>> + return ret;
>>>>
>>>> why do you need to check ->child_reaper under tasklist_lock? this looks pointless.
>>>>
>>>> In fact I do not understand how it is possible to hit pid_ns->child_reaper == NULL,
>>>> there must be at least one task in this namespace, otherwise you can't open a file
>>>> which has f_op == ns_file_operations, no?
>>>
>>> Sure, it's impossible to pick a pid_ns, if there is no the pid_ns's tasks. I added
>>> it under impression of
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=dfda351c729733a401981e8738ce497eaffcaa00
>>> but here it's completely wrong. It will be removed in v2.
>>
>> Hmm. But if I read this commit correctly then we really need to check
>> pid_ns->child_reaper != NULL ?
>>
>> Currently we can't pick an "empty" pid_ns. But after the commit above a task
>> can do sys_unshare(CLONE_NEWPID), another (or the same) task can open its
>> /proc/$pid/ns/pid_for_children and call ns_ioctl() before the 1st alloc_pid() ?
>
> Another task can't open /proc/$pid/ns/pid_for_children before the 1st alloc_pid(),
> because pid_for_children is available to open only after the 1st alloc_pid().
> So, it's impossible to call ioctl() on it.

That sounds reasonable.

There is definitely the chance of the child_reaper dying after we have
joined a pid namespace. So child_reaper can be stale if not NULL.

As long as we don't mess up the first pid allocation I don't
see any reason why we should care about last_pid in a pid_namespace.
And this ioctl can be used to set all of the other pids on the first
pid allocation by calling it in the parent pid namespace.

There is still the chance of racing with a pid reaper dying. Why do we
care about child_reaper in this case?

Changing last_pid is completely pointless if child_reaper is dead or
missing but why would we care?

Although looking at it we probably want to call set_last_pid just to
be consistent with everything else.

Eric