Potentially undesirable interactions between vfork() and time namespaces

From: Alexey Izbyshev
Date: Tue Aug 30 2022 - 15:49:56 EST


Hi,

I've looked at Andrei's patch[1] that permitted vfork() after unshare(CLONE_NEWTIME) and noticed a couple of odd things that I'd like to point out.

/*
* If the new process will be in a different time namespace
* do not allow it to share VM or a thread group with the forking task.
+ *
+ * On vfork, the child process enters the target time namespace only
+ * after exec.
*/
- if (clone_flags & (CLONE_THREAD | CLONE_VM)) {
+ if ((clone_flags & (CLONE_VM | CLONE_VFORK)) == CLONE_VM) {
if (nsp->time_ns != nsp->time_ns_for_children)
return ERR_PTR(-EINVAL);
}

This change permits not only a normal vfork(), but also clone(CLONE_VM|CLONE_VFORK|CLONE_SIGHAND|CLONE_THREAD). I'm not sure whether it can cause real harm, but it's pretty inconsistent to forbid creation of normal threads after unshare(CLONE_NEWTIME), but permit such weird ones, so maybe the check should be strengthened.

Also, if such a thread execs, no time namespace switch will happen because it's vfork_done field will be cleared when its creator (a sibling thread) is killed by de_thread().

+ vfork = !!tsk->vfork_done;
old_mm = current->mm;
exec_mm_release(tsk, old_mm);
if (old_mm)
@@ -1030,6 +1033,10 @@ static int exec_mmap(struct mm_struct *mm)
tsk->mm->vmacache_seqnum = 0;
vmacache_flush(tsk);
task_unlock(tsk);
+
+ if (vfork)
+ timens_on_fork(tsk->nsproxy, tsk);
+

Similarly, even after a normal vfork(), time namespace switch could be silently skipped if the parent dies before "tsk->vfork_done" is read. Again, I don't know whether anybody cares, but this behavior seems non-obvious and probably unintended to me.

Thanks,
Alexey

[1] https://lore.kernel.org/all/20220613060723.197407-1-avagin@xxxxxxxxx/