Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup

From: Li Chen

Date: Wed Jun 10 2026 - 08:32:16 EST

Hi John,

---- On Wed, 10 Jun 2026 01:27:47 +0800 John Ericson <mail@xxxxxxxxxxxxxx> wrote ---
>
>
> On Tue, Jun 9, 2026, at 10:43 AM, Li Chen wrote:
> > Hi Andy,
> >
> > ---- On Tue, 09 Jun 2026 08:01:57 +0800 Andy Lutomirski <luto@xxxxxxxxxx> wrote ---
> > > [...]
> > >
> > > After contemplating this for a bit... why pidfd? Doesn't a pidfd
> > > refer to an actual process that is, or at least was, running? This
> > > new thing is a process that we are contemplating spawning. I can
> > > imagine that basically all pidfd APIs would be a bit confused by the
> > > nonexistence of the process in question.
> > >
> >
> > Yes, I think that is a real concern.
> >
> > In my current local WIP I tried to keep that distinction explicit.
> > pidfd_spawn_open() returns a pidfs-backed builder fd, not a normal pidfd
> > referring to a process. The builder fd is allocated as an anonymous pidfs
> > file with builder-specific file operations:
> >
> > file = pidfs_alloc_anon_file("[pidfd_spawn]",
> > &pidfd_spawn_builder_fops, builder,
> > O_RDWR);
> >
>
> What does your builder fd point to, explicitly? For example in my other reply I
> talked about how it was "real" process state. In my FreeBSD patch, for example,
> I found there was already a status for a process "in exec", and I figured that
> was clean to reuse for one of these "embryonic" processes that also hadn't
> started running. I would reckon that Linux probably has some similar notions.
>
> > and the normal pidfd helpers still reject it because it does not use the
> > ordinary pidfd file operations:
> >
> > struct pid *pidfd_pid(const struct file *file)
> > {
> > if (file->f_op != &pidfs_file_operations)
> > return ERR_PTR(-EBADF);
> > return file_inode(file)->i_private;
> > }
> >
> > So the current split is:
> >
> > builder_fd = pidfd_spawn_open(...); /* builder object */
> > pidfd_config(builder_fd, ...);
> > child_pidfd = pidfd_spawn_run(builder_fd, ...); /* real pidfd */
> >
> > Only the last fd is a normal pidfd for an actual child process. The builder
> > fd is only accepted by the builder operations.
> >
> > This avoids having to define what waitid(P_PIDFD), pidfd_send_signal(),
> > pidfd_getfd(), poll(), etc. mean before the process exists.
>
> I wouldn't be so sure this is necessary/good. For example, I think it could
> make sense to wait on a process that has yet to be started; one just waits for
> both the process to start and the process to exit. Obviously a blocking syscall
> in the thread that is spawning the process is not useful, but the asynchronous
> poll variation seems fine.
>
> As long as there is real process state here, it shouldn't be too hard to
> implement.
>
> > The downside is that it adds a separate open-style entry point and is less
> > uniform than the pidfd_open(0, PIDFD_EMPTY) spelling Christian sketched.
>
> I do think there is no point having two file descriptors. The file descriptor
> that previously referred to the builder/embryonic process then can refer to the
> real process, right?
>
> > If people think there is a better way to represent the pre-spawn builder
> > state, or if the preference is to integrate it directly into pidfd_open()
> > with an explicit empty/future-pidfd state, I would be happy to discuss that.
>
> Hope the above answers your question? I suppose my ideas lean more on the
> "future" than "empty" side --- there is indeed a thread in the thread group,
> with real VM/namespace/file descriptor etc. state. Moreover, state gets
> initialized before the process is started, so the actual start is a pretty
> lightweight step of just letting the scheduler know the now-ready process can
> be scheduled. The only thing that distinguishes the embryonic process from a
> real one is simply that it isn't running --- i.e. isn't (yet) available to be
> scheduled --- so the pidfds holders are free to poke at its state.
>
> Cheers,
>
> John
>

Thanks, this helped a lot. I looked at FreeBSD/OpenBSD/XNU after your
note. FreeBSD has P_INEXEC, OpenBSD has PS_INEXEC, and XNU seems even
closer with P_LINTRANSIT, described as "process in exec or in creation".
Linux does not seem to have a single equivalent today: current->in_execve
is only an LSM hint, while the real synchronization is spread across
exec_update_lock, cred_guard_mutex, and the exec path.

I am switching my local WIP from the two-fd builder model to one fd,
closer to Christian's sketch:

fd = pidfd_open(0, PIDFD_EMPTY);
pidfd_config(fd, ...);
pidfd_spawn_run(fd, ...);

In my current local version, I still use copy_process(), so the fd points
at a real task_struct/pid that is not woken until run. Following
Christian's point that existing APIs can handle this not-yet-running case
with ESRCH, I currently make ordinary pidfd operations that need a real
started process return -ESRCH before start.

I am not sure yet whether Linux should grow a general exec/creation
transition state like that, or whether a narrower future-process
lifecycle is enough for this API. I will think more about that when
working on the pristine process version.

Regards,
Li