Re: [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock

From: Jann Horn

Date: Thu May 28 2026 - 10:28:28 EST

On Thu, May 28, 2026 at 3:11 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> Jann Horn <jannh@xxxxxxxxxx> writes:
>
> > On Mon, May 25, 2026 at 9:56 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> >> I have added a couple more people who might be interested.
> >>
> >> Kees Cook because as you have structured this it is an exec problem.
> >>
> >> Oleg Nesterov as he is knowledgable about ptrace.
> >>
> >> Jann Horn <jannh@xxxxxxxxxx> writes:
> >>
> >> > My understanding is that procfs is effectively maintained by the VFS
> >> > maintainers (though scripts/get_maintainer.pl claims that there are
> >> > no maintainers for procfs because the VFS entry only claims files
> >> > directly in fs/, and the procfs entry has no maintainers listed on
> >> > it).
> >> >
> >> > In procfs, most uses of ptrace_may_access() should use
> >> > exec_update_lock to avoid TOCTOU issues with concurrent privileged
> >> > execve() (like setuid binary execution).
> >> >
> >> > This series doesn't fix all the remaining issues in procfs, but it fixes
> >> > the easy cases for now; I will probably follow up with fixes for the
> >> > gnarlier cases later unless someone else wants to do that.
> >> >
> >> > I have checked that procfs files still work with these changes and that
> >> > CONFIG_PROVE_LOCKING=y doesn't generate any warnings.
> >> >
> >> > (checkpatch complains about missing argument names in
> >> > proc_op::proc_get_link, but that was already the case before my
> >> > patch.)
> >>
> >>
> >> I think I finally have my context paged back in so I can intelligently
> >> say something about this series.
> >>
> >> The scenario you are worried about is when exec gains privileges,
> >> and we read through proc and authenticate with the old credentials
> >> instead of the new credentials.
> >>
> >> Question 1.
> >>
> >> Assuming the executable is world readable (which they generally are)
> >> is there anything that becomes accessible in that race that was
> >> not already accessible?
> >
> > I believe so - the gnarliest example I am thinking of is:
> > Memfds are always mode 0777 or 0666 (see __shmem_file_setup, which
> > sets S_IRWXUGO), so their access control is purely based on being able
> > to pathwalk to the memfd's inode. If you can race
> > open(/proc/$pid/fd/$n) with the process $pid going through setuid
> > execution and calling memfd_create(), you should be able to get
> > read+write access to the memfd created by the setuid binary that was
> > supposed to be private.
> >
> > (But I have not tested that and don't know if there are actually any
> > setuid binaries that happen to use memfds.)
>
> I don't know about memfds. I do know it has been a concern in the past
> about opening proc using credentials before exec and then using the
> credentials after exec.
>
> We certainly closed some of those races, if there are still some
> of those races present we should definitely close them.
>
> In my thinking there are the set of races that exist because we fail to
> present exec to userspace as an atomic operation. Then there are larger

Yes, and exec_update_lock is how we make sure we can present the core
of exec to userspace as an atomic operation.

> set of races that exist simply because exec happened.
>
>
> Looking at it proc fd's currently rely on the permission checks
> of the file descriptors themselves and don't make any guarantees
> about the path.

What do you mean by "the permission checks of the file descriptors themselves"?

> The issue you point out with memfd's definitely needs to be fixed.
> It should be separated out from the rest of the races simply because
> it is a completely different kind of issue.
>
> I wonder if anyone even anticipated you could open another file handle
> to memfd's through proc. If so leaving everything to path based
> permissions assumed a feature of proc that doesn't exist.

I don't think memfds are particularly special here, they are just a
nice, clear example of a case where an inode is protected based on
which processes can path-walk to it.

As another example: Making a directory mode 0700 is also supposed to
prevent other users from accessing things inside it.

> My gut says the best fix for the entire memfd issue is to simply change
> memfd's and probably everything that calls shmem_file_setup to not have
> an open method. That eliminates any chance anyone will do anything
> clever with proc. But I can't see why it makes any sense to be able to
> open another file handle into memfd's, or anything else that calls
> shmem_file_setup for that matter.
>
> We can first try to remove the open method of memfd's set by
> shmem_file_setup, and if that doesn't work we can look at fixing proc to
> provide the guarantees that were assumed (as a security fix).
>
>
> As a quality of implementation issue I can see fixing the small race
> where when looking up a file descriptor through proc, exec does not
> appear to be an atomic operation. I keep wondering if that is something
> that should be done in get_link or d_revalidate.

I don't see how d_revalidate would help, that still wouldn't be atomic.

> I suspect the answer for proc_pid_get_link is to either cache something
> like a seqcount, or simply to repeat the permission and existence checks
> just before calling nd_jump_link.

That seems like it results in complicated semantics, while a mutex
would provide clear semantics. Which is already what we use in places
like __pidfd_fget() and /proc/<pid>/syscall.

> >> Question 2.
> >>
> >> How does this race compare to racing with setresuid?
> >> Do we need to fix the setresuid case as well?
> >
> > Which setresuid case? setresuid clears the dumpable flag and has a
> > memory barrier that is supposed to make that properly ordered against
> > ptrace_may_access(); so setresuid() should normally not cause a task
> > to become traceable, though that could maybe happen in weird
> > scenarios.
>
> The cases where the dumpable flag get set are all part of exec.
>
> I was thinking of cases where we have a daemon that is started by
> root and then it changes it's uid to do something.

(Normally such a daemon would only change its EUID, which is mainly
considered when the daemon acts as a subject, unless it intends to
permanently drop privileges.)

> Looking at ptrace_may_access the uid based checks won't allow
> accessing of such a task unless it changes all of it's uids.
>
> At which point arguably it is on the process that calls setuid to make
> certain ptracing it won't be a problem. I am not certain that ever
> actually works in practice but that does seem to be what the current
> code is saying.

When a daemon changes its EUID for some reason, commit_creds() will
change that daemon's dumpability to suid_dumpable, which will prevent
ptracing it even if the UIDs match.

The ptrace.2 manpage guarantees that a process which is not
SUID_DUMP_USER can't be accessed without CAP_SYS_PTRACE.

> Now I am wondering if dumpable should get set if setresuid changes
> a uid like I described above.

What do you mean by "get set"?

> > I think another case we should probably care about is what happens if
> > a process which is only protected against ptrace by being non-dumpable
> > goes through execve() - it shouldn't be possible to access resources
> > associated with the pre-execve state while checking against the
> > post-execve dumpability. It might be important for this that the
> > do_close_on_exec() logic currently happens after committing the
> > dumpable state in exec_mmap()...
> >
> >> Question 3.
> >> Do we care about the case when a privileged process calls a setuid
> >> process and drops privileges?
> >
> > I don't understand the question. Hmm - do you mean a case where a
> > process with ruid=1000, euid=0, suid=1000 does execve() on a setuid
> > 1000 binary? I think we probably don't specifically care about that...
> >
>
> The general case would be a daemon running as root forks and exec's a
> binary running as some unprivileged user fred.
>
> Mostly I bring it up is that it is easy to forget suid exec can drop
> privileges as well as raise them.

I do not understand the scenario you're describing. Can you give a
specific example you're thinking of - what the ruid/euid/suid would be
before execve(), and which mode the binary would be?