Re: [PATCH 1/2] fs/exec: switch timens when a task gets a new mm

From: Kees Cook
Date: Wed Sep 21 2022 - 23:34:01 EST


On Tue, Sep 20, 2022 at 05:31:19PM -0700, Andrei Vagin wrote:
> From: Andrei Vagin <avagin@xxxxxxxxx>
>
> Changing a time namespace requires remapping a vvar page, so we don't want
> to allow doing that if any other tasks can use the same mm.
>
> Currently, we install a time namespace when a task is created with a new
> vm. exec() is another case when a task gets a new mm and so it can switch
> a time namespace safely, but it isn't handled now.
>
> One more issue of the current interface is that clone() with CLONE_VM isn't
> allowed if the current task has unshared a time namespace
> (timens_for_children doesn't match the current timens).
>
> Both these issues make some inconvenience for users. For example, Alexey
> and Florian reported that posix_spawn() uses vfork+exec and this pattern
> doesn't work with time namespaces due to the both described issues.
> LXC needed to workaround the exec() issue by calling setns.
>
> In the commit 133e2d3e81de5 ("fs/exec: allow to unshare a time namespace on
> vfork+exec"), we tried to fix these issues with minimal impact on UAPI. But
> it adds extra complexity and some undesirable side effects. Eric suggested
> fixing the issues properly because here are all the reasons to suppose that
> there are no users that depend on the old behavior.
>
> Cc: Alexey Izbyshev <izbyshev@xxxxxxxxx>
> Cc: Christian Brauner <brauner@xxxxxxxxxx>
> Cc: Dmitry Safonov <0x7f454c46@xxxxxxxxx>
> Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Cc: Florian Weimer <fweimer@xxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Suggested-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Origin-author: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Signed-off-by: Andrei Vagin <avagin@xxxxxxxxx>

This looks good -- my intention is for this to go into -next after the
v6.1 merge window closes. Does that match everyone's expectations?

Thanks!

-Kees

--
Kees Cook