Re: [PATCH 2/7] ns: Introduce the setns syscall

From: Nathan Lynch
Date: Wed May 11 2011 - 15:21:33 EST


Hi Eric,

On Fri, 2011-05-06 at 19:24 -0700, Eric W. Biederman wrote:
> With the networking stack today there is demand to handle
> multiple network stacks at a time. Not in the context
> of containers but in the context of people doing interesting
> things with routing.
>
> There is also demand in the context of containers to have
> an efficient way to execute some code in the container itself.
> If nothing else it is very useful ad a debugging technique.
>
> Both problems can be solved by starting some form of login
> daemon in the namespaces people want access to, or you
> can play games by ptracing a process and getting the
> traced process to do things you want it to do. However
> it turns out that a login daemon or a ptrace puppet
> controller are more code, they are more prone to
> failure, and generally they are less efficient than
> simply changing the namespace of a process to a
> specified one.
>
> Pieces of this puzzle can also be solved by instead of
> coming up with a general purpose system call coming up
> with targed system calls perhaps socketat that solve
> a subset of the larger problem. Overall that appears
> to be more work for less reward.
>
> int setns(int fd, int nstype);
>
> The fd argument is a file descriptor referring to a proc
> file of the namespace you want to switch the process to.
>
> In the setns system call the nstype is 0 or specifies
> an clone flag of the namespace you intend to change
> to prevent changing a namespace unintentionally.

I don't understand exactly what the nstype argument buys us - why would
correct code ever need to specify a value other than 0? And reusing the
CLONE_NEW* values in this interface is kind of ugly when setns is
precisely _not_ creating new namespaces.

Is there some fundamental reason it couldn't be

int setns(int fd);

or is there a use case I'm missing?


> +SYSCALL_DEFINE2(setns, int, fd, int, nstype)
> +{
> + const struct proc_ns_operations *ops;
> + struct task_struct *tsk = current;
> + struct nsproxy *new_nsproxy;
> + struct proc_inode *ei;
> + struct file *file;
> + int err;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + file = proc_ns_fget(fd);
> + if (IS_ERR(file))
> + return PTR_ERR(file);
> +
> + err = -EINVAL;
> + ei = PROC_I(file->f_dentry->d_inode);
> + ops = ei->ns_ops;
> + if (nstype && (ops->type != nstype))
> + goto out;
> +
> + new_nsproxy = create_new_namespaces(0, tsk, tsk->fs);

create_new_namespaces() can fail; shouldn't this be checked?


> + err = ops->install(new_nsproxy, ei->ns);
> + if (err) {
> + free_nsproxy(new_nsproxy);
> + goto out;
> + }
> + switch_task_namespaces(tsk, new_nsproxy);
> +out:
> + fput(file);
> + return err;
> +}
> +


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/