[GIT PULL] thread changes for v5.8

From: Christian Brauner
Date: Tue Jun 02 2020 - 08:20:04 EST


Hi Linus,

Here are the changes for v5.8:

/* Summary */
We have been discussing using pidfds to attach to namespaces for quite a while and the patches have
in one form or another already existed for about a year. But I wanted to wait to see how the general
api would be received and adopted.
This pr contains the changes to make it possible to use pidfds to attach to the namespaces of a
process, i.e. they can be passed as the first argument to the setns() syscall. When only a single
namespace type is specified the semantics are equivalent to passing an nsfd. That means
setns(nsfd, CLONE_NEWNET) equals setns(pidfd, CLONE_NEWNET). However, when a pidfd is passed,
multiple namespace flags can be specified in the second setns() argument and setns() will attach the
caller to all the specified namespaces all at once or to none of them. Specifying 0 is not valid
together with a pidfd. Here are just two obvious examples:
setns(pidfd, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET);
setns(pidfd, CLONE_NEWUSER);
Allowing to also attach subsets of namespaces supports various use-cases where callers setns to a
subset of namespaces to retain privilege, perform an action and then re-attach another subset of
namespaces.
Apart from significantly reducing the number of syscalls needed to attach to all currently supported
namespaces (8 * open(/proc/<pid>/ns/<ns>) + 8 * setns(nsfd) vs 1 * setns(pidfd, <ns-flags>) this
also allows atomic setns to a set of namespaces, i.e. either attaching to all namespaces succeeds or
we fail without having changed anything. This is centered around a new internal struct nsset which
holds all information necessary for a task to switch to a new set of namespaces atomically. Fwiw,
with this change a pidfd becomes the only token needed to interact with a container. I'm expecting
this to be picked-up by util-linux for nsenter rather soon. Associated with this change is a shiny
new test-suite dedicated to setns() (for pidfds and nsfds alike).

The following changes since commit 0e698dfa282211e414076f9dc7e83c1c288314fd:

Linux 5.7-rc4 (2020-05-03 14:56:04 -0700)

are available in the Git repository at:

git@xxxxxxxxxxxxxxxxxxx:pub/scm/linux/kernel/git/brauner/linux tags/threads-v5.8

for you to fetch changes up to 2b40c5db73e239531ea54991087f4edc07fbb08e:

selftests/pidfd: add pidfd setns tests (2020-05-13 11:41:22 +0200)

/* Testing */
All patches are based on v5.7-rc4 and have been sitting in linux-next. No build failures or
warnings were observed. All old and new tests are passing.

/* Conflicts */
At the time of creating this PR no merge conflicts were reported from linux-next and no merge
conflicts showed up doing a test-merge with current mainline 359287765c04 ("Merge branch
'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.

Please consider pulling these changes from the signed threads-v5.8 tag.

Thanks!
Christian

----------------------------------------------------------------
threads-v5.8

----------------------------------------------------------------
Christian Brauner (3):
nsproxy: add struct nsset
nsproxy: attach to namespaces via pidfds
selftests/pidfd: add pidfd setns tests

fs/namespace.c | 15 +-
fs/nsfs.c | 5 +
include/linux/mnt_namespace.h | 2 +
include/linux/nsproxy.h | 24 ++
include/linux/proc_fs.h | 2 +
include/linux/proc_ns.h | 4 +-
ipc/namespace.c | 7 +-
kernel/cgroup/namespace.c | 5 +-
kernel/nsproxy.c | 305 +++++++++++++--
kernel/pid_namespace.c | 5 +-
kernel/time/namespace.c | 5 +-
kernel/user_namespace.c | 8 +-
kernel/utsname.c | 5 +-
net/core/net_namespace.c | 5 +-
tools/testing/selftests/pidfd/.gitignore | 1 +
tools/testing/selftests/pidfd/Makefile | 3 +-
tools/testing/selftests/pidfd/config | 6 +
tools/testing/selftests/pidfd/pidfd_setns_test.c | 473 +++++++++++++++++++++++
18 files changed, 833 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/pidfd/config
create mode 100644 tools/testing/selftests/pidfd/pidfd_setns_test.c