Re: [PATCH 2/2] pid_ns: Introduce ioctl to set vector of ns_last_pid's on ns hierarhy

From: Eric W. Biederman
Date: Wed Apr 26 2017 - 12:38:50 EST


Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes:

> On 26.04.2017 19:11, Kirill Tkhai wrote:
>> On 26.04.2017 18:53, Oleg Nesterov wrote:
>>> On 04/17, Kirill Tkhai wrote:
>>>>
>>>> +struct pidns_ioc_req {
>>>> +/* Set vector of last pids in namespace hierarchy */
>>>> +#define PIDNS_REQ_SET_LAST_PID_VEC 0x1
>>>> + unsigned int req;
>>>> + void __user *data;
>>>> + unsigned int data_size;
>>>> + char std_fields[0];
>>>> +};
>>>
>>> see below,
>>>
>>>> +static long set_last_pid_vec(struct pid_namespace *pid_ns,
>>>> + struct pidns_ioc_req *req)
>>>> +{
>>>> + char *str, *p;
>>>> + int ret = 0;
>>>> + pid_t pid;
>>>> +
>>>> + read_lock(&tasklist_lock);
>>>> + if (!pid_ns->child_reaper)
>>>> + ret = -EINVAL;
>>>> + read_unlock(&tasklist_lock);
>>>> + if (ret)
>>>> + return ret;
>>>
>>> why do you need to check ->child_reaper under tasklist_lock? this looks pointless.
>>>
>>> In fact I do not understand how it is possible to hit pid_ns->child_reaper == NULL,
>>> there must be at least one task in this namespace, otherwise you can't open a file
>>> which has f_op == ns_file_operations, no?
>>
>> Sure, it's impossible to pick a pid_ns, if there is no the pid_ns's tasks. I added
>> it under impression of
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=dfda351c729733a401981e8738ce497eaffcaa00
>> but here it's completely wrong. It will be removed in v2.
>>
>>>> + if (req->data_size >= PAGE_SIZE)
>>>> + return -EINVAL;
>>>> + str = vmalloc(req->data_size + 1);
>>>
>>> then I don't understand why it makes sense to use vmalloc()
>>>
>>>> + if (!str)
>>>> + return -ENOMEM;
>>>> + if (copy_from_user(str, req->data, req->data_size)) {
>>>> + ret = -EFAULT;
>>>> + goto out_vfree;
>>>> + }
>>>> + str[req->data_size] = '\0';
>>>> +
>>>> + p = str;
>>>> + while (p && *p != '\0') {
>>>> + if (!ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN)) {
>>>> + ret = -EPERM;
>>>> + goto out_vfree;
>>>> + }
>>>> +
>>>> + if (sscanf(p, "%d", &pid) != 1 || pid < 0 || pid > pid_max) {
>>>> + ret = -EINVAL;
>>>> + goto out_vfree;
>>>> + }
>>>
>>> Well, this is ioctl(), do we really want to parse the strings?
>>>
>>> Can't we make
>>>
>>> struct pidns_ioc_req {
>>> ...
>>> int nr_pids;
>>> pid_t pids[0];
>>> }
>>>
>>> and just use get_user() in a loop? This way we can avoid vmalloc() or anything
>>> else altogether.
>>
>> Since it's a generic structure for different types of the requests, it may be extended
>> in the future. We won't be able to add new fields, if we compose the structure the way
>> you suggested, will we?
>
> Though, we may go this way if just do the fields generic:
>
> struct pidns_ioc_req {
> unsigned int req;
> unsigned int data_size;
> union {
> pid_t pid[0];
> };
> };
>
> Ok, I'll rework the patchset in this way.

You don't need that. That is what new ioctl numbers are for.

Interfaces to the kernel don't need to become multiplexors to prepare
for the future when there is already an appropriate multiplexing
interface in place.

Eric