Re: Potentially undesirable interactions between vfork() and time namespaces

From: Christian Brauner
Date: Fri Sep 09 2022 - 03:55:05 EST


On Thu, Sep 08, 2022 at 05:13:08PM -0500, Eric W. Biederman wrote:
> Christian Brauner <brauner@xxxxxxxxxx> writes:
>
> > On Wed, Sep 07, 2022 at 10:15:51AM -0700, Andrei Vagin wrote:
> >> On Wed, Sep 07, 2022 at 08:33:20AM +0300, Alexey Izbyshev wrote:
> >> > >
> >> > > That is something to be double checked.
> >> > >
> >> > > I can't see where it would make sense to unshare a time namespace and
> >> > > then call exec, instead of calling exit. So I suspect we can just
> >> > > change this behavior and no one will notice.
> >> > >
> >> > One can imagine a helper binary that calls unshare, forks some children in
> >> > new namespaces, and then calls exec to hand off actual work to another
> >> > binary (which might not expect being in the new time namespace). I'm purely
> >> > theorizing here, however. Keeping a special case for vfork() based only on
> >> > FUD is likely a net negative, so it'd be nice to hear actual time namespace
> >> > users speak up, and switch to the solution you suggested if they don't care.
> >>
> >> I can speak for one tool that uses time namespaces for the right
> >> reasons. It is CRIU. When a process is restored, the monotonic and
> >> boottime clocks have to be adjusted to match old values. It is for what
> >> the timens was designed for. These changes doesn't affect CRIU.
> >>
> >> Honestly, I haven't heard about other users of timens yet. I don't take
> >> into account tools like unshare.
> >
> > LXC/LXD does
> >
> > unshare(CLONE_NEWTIME)
> > // write offsets to /proc/self/timens_offsets
> > timens_fd = open("/proc/self/ns/time_for_children", O_RDONLY | O_CLOEXEC)
> > setns(timens_fd, CLONE_NEWTIME)
> > exec(payload)
> >
> > so I agree don't change the uapi, please.
> >
> > But as you can see what we do is basically emulating changing time
> > namespace during exec via the setns() prior to the exec call.
>
> If I understand the description of lxc/lxd correctly the proposed change
> will not effect lxc/lxd, as the time namespace is already installed
> before exec. If anything what is proposed would potentially allow
> lxc/lxd to be simplified in the future by removing the setns.
>
> Are you then requesting the behavior of the time namespace not change
> when the proposed change will not effect lxc/lxd?

Don't change /proc/self/ns/time_for_children to a different name.
As stated above the proposed exec behavior we currently clearly emulate
in userspace. So that part is fine.