Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
From: Jann Horn
Date: Tue Jun 09 2026 - 13:55:20 EST
On Tue, Jun 9, 2026 at 8:08 AM Florian Weimer <fweimer@xxxxxxxxxx> wrote:
>
> * Jann Horn:
>
> >> Per the above, the primary win would stem from *NOT* messing with mm.
> >
> > As you write below, I think we have that with CLONE_MM? The C function
> > vfork() is kind of a terrible API because of its returns-twice
> > behavior, but I think if process cloning with CLONE_VM|CLONE_VFORK was
> > wrapped by libc in a way similar to clone() (with the child executing
> > a separate handler function), or if it was used in the implementation
> > of some higher-level process-spawning API, it would be a perfectly
> > fine API?
>
> No, there is still a problem with SIGTSTP handling because we cannot
> atomically unmask the signal during execve. We need to unblock SIGTSTP
> before execve in the new process, but this means that it can get
> suspended by SIGTSTP. Consequently, the execve never happens and the
> original process is stuck in vfork:
>
> posix_spawn: parent can get stuck in uninterruptible sleep if child
> receives SIGTSTP early enough
> <https://inbox.sourceware.org/libc-help/2921668c-773e-465d-9480-0abb6f979bf9@xxxxxxxxxxxxxxxx/>
>
> More on the low-level side, it's difficult to make sure that execve gets
> a consistent snapshot of the environ vector. Both vfork and execve need
> to be async-signal-safe. Any locking or memory allocation (except for
> the stack …) persists in the original process after vfork returns. The
I think that's not entirely accurate; if you call set_robust_list() on
a futex list, then call execve(), the futexes should be released once
the process switches to a new MM, in
begin_new_exec -> exec_mmap -> exec_mm_release -> futex_exec_release
-> futex_cleanup -> exit_robust_list.
So in theory you could use clone() with CLONE_VM and without
CLONE_VFORK, and let the parent either wait for a futex that is
released on exec, or somehow asynchronously check later whether the
futex is still held... probably not the nicest building block but
maybe workable? Though I guess it would fit more nicely if there was a
"munmap() this range on exec" API...
> environ vector can be large, so making a copy on the stack is not ideal.
> It's even harder for getenv/setenv/unsetenv implementations that use
> locking instead of software transactional memory.
Makes sense, that kind of sounds like a pain inherent in being able to
execute from signal handler context...