Re: For review: documentation of clone3() system call

From: Florian Weimer
Date: Mon Nov 11 2019 - 10:20:54 EST


* Jann Horn:

> On Mon, Nov 11, 2019 at 4:03 PM Florian Weimer <fweimer@xxxxxxxxxx> wrote:
>>
>> * Michael Kerrisk:
>>
>> > Another difference for the raw clone() system call is that the
>> > stack argument may be NULL, in which case the child uses a dupliâ
>> > cate of the parent's stack. (Copy-on-write semantics ensure that
>> > the child gets separate copies of stack pages when either process
>> > modifies the stack.) In this case, for correct operation, the
>> > CLONE_VM option should not be specified. (If the child shares the
>> > parent's memory because of the use of the CLONE_VM flag, then no
>> > copy-on-write duplication occurs and chaos is likely to result.)
>>
>> I think sharing the stack also works with CLONE_VFORK with CLONE_VM, as
>> long as measures are taken to preserve the return address in a register.
>
> That basically just requires that the userspace function declaration
> for clone3 includes __attribute__((returns_twice)), right?

The clone3 implementation itself would have to store the return address
in a register because at the time of the second return, a return address
on the stack may have been corrupted by the subprocess because what used
to be the stack frame of the clone function has since been reused for
something else.

__attribute__ ((returns_twice)) is likely needed as well, but that
benefits the caller. It's also not clear that it is sufficient for this
to work in all cases. (But the mandatory-to-implement vfork function
faces the same problems.)

Thanks,
Florian