Re: [PATCH] nsproxy: attach to namespaces via pidfds

From: Christian Brauner
Date: Mon Apr 27 2020 - 12:11:52 EST


On Mon, Apr 27, 2020 at 10:21:55AM -0500, Eric W. Biederman wrote:
>
> I am still catching up on the what exists for pidfd. Do you have a way
> to safely go from a pidfd to the corresponding proc directory?

Yep, that's possible. The pidfd's fdinfo file contains the same format
for the Pid: and NSpid: fields as /proc/<pid>/status. Here's e.g. what
systemd is doing currently:

int pidfd_get_pid(int fd, pid_t *ret) {
char path[STRLEN("/proc/self/fdinfo/") + DECIMAL_STR_MAX(int)];
_cleanup_free_ char *fdinfo = NULL;
char *p;
int r;

if (fd < 0)
return -EBADF;

xsprintf(path, "/proc/self/fdinfo/%i", fd);

r = read_full_file(path, &fdinfo, NULL);
if (r == -ENOENT) /* if fdinfo doesn't exist we assume the process does not exist */
return -ESRCH;
if (r < 0)
return r;

p = startswith(fdinfo, "Pid:");
if (!p) {
p = strstr(fdinfo, "\nPid:");
if (!p)
return -ENOTTY; /* not a pidfd? */

p += 5;
}

p += strspn(p, WHITESPACE);
p[strcspn(p, WHITESPACE)] = 0;

return parse_pid(p, ret);
}

>
> That would make this setns work just an optimization. A nice one but
> just an optimization.

Hm, I tried to describe how it's not just a worthwhile optimization
because it gets the number of syscalls down from 14 to a single syscall
which is kinda excellent for something like attach/exec into a container
which is a fairly common operation but it also gives us a couple of
other nice properties such as atomic attach and appearing in all
namespace at the same time similar to clone with all namespace flags
set.

Thanks!
Christian