Re: [PATCH 3/3] pids: Make it possible to clone tasks with givenpids

From: Pavel Emelyanov
Date: Fri Nov 11 2011 - 05:11:55 EST


>>> The child_tidptr points to an array of pids for current namespace and
>>> its ancestors. When 0 is met in this array the pid number for the
>>> corresponding namespace is generated, rather than set.
>>
>> I must have missed something, but I can't unserstand how this works.
>>
>>> For security reasons after a regular clone/fork is done in a namespace
>>> further cloning with predefined pid is not allowed.
>>
>> I guess, this is pid_ns->last_pid != 0 check in set_pidmap(), right ?

Thanks for the feedback, Oleg! Please, see my explanation below.

>>> +static int set_pidmap(struct pid_namespace *pid_ns, int pid)
>>> +{
>>> + int offset;
>>> + struct pidmap *map;
>>> +
>>> + offset = pid & BITS_PER_PAGE_MASK;
>>> + map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
>>> +
>>> + if (unlikely(!map->page))
>>> + if (alloc_pidmap_page(map))
>>> + return -ENOMEM;
>>> +
>>> + if (pid_ns->last_pid != 0)
>>> + return -EPERM;
>>
>> OK, but it should be always true, no? IOW, set_pidmap() should always
>> fail?
>>
>> Unless: you are using CLONE_NEWPID along with CLONE_CHILD_USEPIDS and
>> this child_tidptr array has only one pid (before zero pid).
>
> And, if you do clone(CLONE_NEWPID | CLONE_CHILD_USEPIDS), then
> new_ns->child_reaper == NULL (unless you pass "1" in child_tidptr[]) ?
>
>> So, could you please explain what I have missed?
>
> please ;) I guess I misread this patch completely. Help!

This is how I plan to use this functionality.

When creating an init of a container being restored I call

pids[0] = 1;
pids[1] = 0;

clone(CLONE_NEWPID | CLONE_CHILD_USEPIDS, &pids)

At this point the newly created namespace will have last_pid == 0 and will allow
for this init to be created. Then this created "init" task will have to read pids
from image files and call

pids[0] = <pid>
pids[1] = 0

clone(CLONE_CHILD_USEPIDS, &pids);

one by one. At this point the last_pid is still 0 and this new tasks with given
pids will be created. The newly created tasks if they have children too will have
to call the same code snippet.

After the restore is completed and new tasks are fork()-ed the last_pid gets finally
updated and new CLONE_CHILD_USEPIDS will return the EPERM in this namespace not
allowing for pids confusion.

And for the init_pid_ns the last_pid is set to non zero early at boot (when the kthreadd
is created) and thus pids abuse isn't allowed for the non-containerized system from
the very boot.

Does this sound OK?

> Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/