Re: [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock

From: Christian Brauner

Date: Wed May 27 2026 - 08:35:40 EST

On Wed, May 27, 2026 at 02:01:51PM +0200, Christian Brauner wrote:
> On Tue, May 26, 2026 at 08:22:38PM +0200, Jann Horn wrote:
> > On Mon, May 25, 2026 at 9:56 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> > > I have added a couple more people who might be interested.
> > >
> > > Kees Cook because as you have structured this it is an exec problem.
> > >
> > > Oleg Nesterov as he is knowledgable about ptrace.
> > >
> > > Jann Horn <jannh@xxxxxxxxxx> writes:
> > >
> > > > My understanding is that procfs is effectively maintained by the VFS
> > > > maintainers (though scripts/get_maintainer.pl claims that there are
> > > > no maintainers for procfs because the VFS entry only claims files
> > > > directly in fs/, and the procfs entry has no maintainers listed on
> > > > it).
> > > >
> > > > In procfs, most uses of ptrace_may_access() should use
> > > > exec_update_lock to avoid TOCTOU issues with concurrent privileged
> > > > execve() (like setuid binary execution).
> > > >
> > > > This series doesn't fix all the remaining issues in procfs, but it fixes
> > > > the easy cases for now; I will probably follow up with fixes for the
> > > > gnarlier cases later unless someone else wants to do that.
> > > >
> > > > I have checked that procfs files still work with these changes and that
> > > > CONFIG_PROVE_LOCKING=y doesn't generate any warnings.
> > > >
> > > > (checkpatch complains about missing argument names in
> > > > proc_op::proc_get_link, but that was already the case before my
> > > > patch.)
> > >
> > >
> > > I think I finally have my context paged back in so I can intelligently
> > > say something about this series.
> > >
> > > The scenario you are worried about is when exec gains privileges,
> > > and we read through proc and authenticate with the old credentials
> > > instead of the new credentials.
> > >
> > > Question 1.
> > >
> > > Assuming the executable is world readable (which they generally are)
> > > is there anything that becomes accessible in that race that was
> > > not already accessible?
> >
> > I believe so - the gnarliest example I am thinking of is:
> > Memfds are always mode 0777 or 0666 (see __shmem_file_setup, which
> > sets S_IRWXUGO), so their access control is purely based on being able
> > to pathwalk to the memfd's inode. If you can race
> > open(/proc/$pid/fd/$n) with the process $pid going through setuid
> > execution and calling memfd_create(), you should be able to get
> > read+write access to the memfd created by the setuid binary that was
> > supposed to be private.
> >
> > (But I have not tested that and don't know if there are actually any
> > setuid binaries that happen to use memfds.)
> >
> > > Question 2.
> > >
> > > How does this race compare to racing with setresuid?
> > > Do we need to fix the setresuid case as well?
> >
> > Which setresuid case? setresuid clears the dumpable flag and has a
> > memory barrier that is supposed to make that properly ordered against
> > ptrace_may_access(); so setresuid() should normally not cause a task
> > to become traceable, though that could maybe happen in weird
> > scenarios.
> >
> > I think another case we should probably care about is what happens if
> > a process which is only protected against ptrace by being non-dumpable
> > goes through execve() - it shouldn't be possible to access resources
> > associated with the pre-execve state while checking against the
> > post-execve dumpability. It might be important for this that the
> > do_close_on_exec() logic currently happens after committing the
> > dumpable state in exec_mmap()...
> >
> > > Question 3.
> > > Do we care about the case when a privileged process calls a setuid
> > > process and drops privileges?
> >
> > I don't understand the question. Hmm - do you mean a case where a
> > process with ruid=1000, euid=0, suid=1000 does execve() on a setuid
> > 1000 binary? I think we probably don't specifically care about that...
> >
> > I think another scenario that we ideally might want to care about is
> > what happens if a process which runs with a normal user's UIDs, but is
> > non-dumpable, goes through execve() of a normal binary while another
> > process tries to inspect its FDs or address space layout - it probably
> > shouldn't be possible to get information about the pre-execve MM and
> > O_CLOEXEC file descriptors.
> >
> > > Question 4.
> > > Is it possible to use a seq_lock instead of reader writer semaphore?
> > > Or is that only for non-sleeping readers?
> >
> > Linux seqcounts are 32-bit, which means they are always kind of dodgy,
> > but they are particularly dodgy if a reader can be forced to sleep for
> > an extended amount of time. I don't see a reason why we couldn't, in
> > general, use a 64-bit sequence count for readers that may need to
> > sleep while reading.
>
> I have a patch series for this that I started working after merging your
> series for precisely this reason: performance. It's a few days old now.
> I've tried various approaches and I started with a simple 32-bit counter
> as the POC. See appended (untested) patches.

In a bunch of cases we know that the critical section the callers cares
about just is very small: creds + mm. So in that case it is easy to
switch the credential computation into a prepare stage and a commit
stage and then the targeted critical section just becomes:
task->signal->seq_mm++ + task->cred = new_cred + task->mm = mm +
task->active_mm = mm + task->signal->seq_mm--. And then the reader
doesn't need to sleep at all and can just spin on the seqcount for the
small window they need.

I wasn't convinced it was valuable to use a fine-grained/multi-seqcount
approach though.