Re: Potentially undesirable interactions between vfork() and time namespaces

From: Andrei Vagin
Date: Tue Aug 30 2022 - 21:18:40 EST


On Tue, Aug 30, 2022 at 12:49 PM Alexey Izbyshev <izbyshev@xxxxxxxxx> wrote:
>
> Hi,
>
> I've looked at Andrei's patch[1] that permitted vfork() after
> unshare(CLONE_NEWTIME) and noticed a couple of odd things that I'd like
> to point out.
>
> /*
> * If the new process will be in a different time namespace
> * do not allow it to share VM or a thread group with the forking
> task.
> + *
> + * On vfork, the child process enters the target time namespace only
> + * after exec.
> */
> - if (clone_flags & (CLONE_THREAD | CLONE_VM)) {
> + if ((clone_flags & (CLONE_VM | CLONE_VFORK)) == CLONE_VM) {
> if (nsp->time_ns != nsp->time_ns_for_children)
> return ERR_PTR(-EINVAL);
> }
>
> This change permits not only a normal vfork(), but also
> clone(CLONE_VM|CLONE_VFORK|CLONE_SIGHAND|CLONE_THREAD). I'm not sure
> whether it can cause real harm, but it's pretty inconsistent to forbid
> creation of normal threads after unshare(CLONE_NEWTIME), but permit such
> weird ones, so maybe the check should be strengthened.

Good catch. I was not aware that CLONE_VFORK is allowed to be used with
CLONE_THREAD. I will send a fix. Thanks.

>
> Also, if such a thread execs, no time namespace switch will happen
> because it's vfork_done field will be cleared when its creator (a
> sibling thread) is killed by de_thread().
>
> + vfork = !!tsk->vfork_done;
> old_mm = current->mm;
> exec_mm_release(tsk, old_mm);
> if (old_mm)
> @@ -1030,6 +1033,10 @@ static int exec_mmap(struct mm_struct *mm)
> tsk->mm->vmacache_seqnum = 0;
> vmacache_flush(tsk);
> task_unlock(tsk);
> +
> + if (vfork)
> + timens_on_fork(tsk->nsproxy, tsk);
> +
>
> Similarly, even after a normal vfork(), time namespace switch could be
> silently skipped if the parent dies before "tsk->vfork_done" is read.
> Again, I don't know whether anybody cares, but this behavior seems
> non-obvious and probably unintended to me.

This is the more interesting case. I will try to find out how we can
handle it properly.

Thanks,
Andrei