Re: Potentially undesirable interactions between vfork() and time namespaces

From: Florian Weimer
Date: Thu Sep 01 2022 - 00:22:06 EST


* Andrei Vagin:

> On Tue, Aug 30, 2022 at 6:18 PM Andrei Vagin <avagin@xxxxxxxxx> wrote:
>>On Tue, Aug 30, 2022 at 10:49:43PM +0300, Alexey Izbyshev wrote:
> <snip>
>>> @@ -1030,6 +1033,10 @@ static int exec_mmap(struct mm_struct *mm)
>>> tsk->mm->vmacache_seqnum = 0;
>>> vmacache_flush(tsk);
>>> task_unlock(tsk);
>>> +
>>> + if (vfork)
>>> + timens_on_fork(tsk->nsproxy, tsk);
>>> +
>>>
>>> Similarly, even after a normal vfork(), time namespace switch could be
>>> silently skipped if the parent dies before "tsk->vfork_done" is read. Again,
>>> I don't know whether anybody cares, but this behavior seems non-obvious and
>>> probably unintended to me.
>> This is the more interesting case. I will try to find out how we can
>> handle it properly.
>
> It might not be a good idea to use vfork_done in this case. Let's
> think about what we have and what we want to change. We don't want to
> allow switching timens if a process mm is used by someone else. But we
> forgot to handle execve that creates a new mm, and we can't change this
> behavior right now because it can affect current users. Right?
>
> So maybe the best choice, in this case, is to change behavior by adding
> a new control that enables it. The first interface that comes to my mind
> is to introduce a new ioctl for a namespace file descriptor. Here is a
> draft patch below that should help to understand what I mean.

Doesn't this bring back the old posix_spawn (vfork) failure?

Thanks,
Florian