Re: [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock

From: Christian Brauner

Date: Wed May 27 2026 - 08:02:06 EST

On Tue, May 26, 2026 at 08:22:38PM +0200, Jann Horn wrote:
> On Mon, May 25, 2026 at 9:56 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> > I have added a couple more people who might be interested.
> >
> > Kees Cook because as you have structured this it is an exec problem.
> >
> > Oleg Nesterov as he is knowledgable about ptrace.
> >
> > Jann Horn <jannh@xxxxxxxxxx> writes:
> >
> > > My understanding is that procfs is effectively maintained by the VFS
> > > maintainers (though scripts/get_maintainer.pl claims that there are
> > > no maintainers for procfs because the VFS entry only claims files
> > > directly in fs/, and the procfs entry has no maintainers listed on
> > > it).
> > >
> > > In procfs, most uses of ptrace_may_access() should use
> > > exec_update_lock to avoid TOCTOU issues with concurrent privileged
> > > execve() (like setuid binary execution).
> > >
> > > This series doesn't fix all the remaining issues in procfs, but it fixes
> > > the easy cases for now; I will probably follow up with fixes for the
> > > gnarlier cases later unless someone else wants to do that.
> > >
> > > I have checked that procfs files still work with these changes and that
> > > CONFIG_PROVE_LOCKING=y doesn't generate any warnings.
> > >
> > > (checkpatch complains about missing argument names in
> > > proc_op::proc_get_link, but that was already the case before my
> > > patch.)
> >
> >
> > I think I finally have my context paged back in so I can intelligently
> > say something about this series.
> >
> > The scenario you are worried about is when exec gains privileges,
> > and we read through proc and authenticate with the old credentials
> > instead of the new credentials.
> >
> > Question 1.
> >
> > Assuming the executable is world readable (which they generally are)
> > is there anything that becomes accessible in that race that was
> > not already accessible?
>
> I believe so - the gnarliest example I am thinking of is:
> Memfds are always mode 0777 or 0666 (see __shmem_file_setup, which
> sets S_IRWXUGO), so their access control is purely based on being able
> to pathwalk to the memfd's inode. If you can race
> open(/proc/$pid/fd/$n) with the process $pid going through setuid
> execution and calling memfd_create(), you should be able to get
> read+write access to the memfd created by the setuid binary that was
> supposed to be private.
>
> (But I have not tested that and don't know if there are actually any
> setuid binaries that happen to use memfds.)
>
> > Question 2.
> >
> > How does this race compare to racing with setresuid?
> > Do we need to fix the setresuid case as well?
>
> Which setresuid case? setresuid clears the dumpable flag and has a
> memory barrier that is supposed to make that properly ordered against
> ptrace_may_access(); so setresuid() should normally not cause a task
> to become traceable, though that could maybe happen in weird
> scenarios.
>
> I think another case we should probably care about is what happens if
> a process which is only protected against ptrace by being non-dumpable
> goes through execve() - it shouldn't be possible to access resources
> associated with the pre-execve state while checking against the
> post-execve dumpability. It might be important for this that the
> do_close_on_exec() logic currently happens after committing the
> dumpable state in exec_mmap()...
>
> > Question 3.
> > Do we care about the case when a privileged process calls a setuid
> > process and drops privileges?
>
> I don't understand the question. Hmm - do you mean a case where a
> process with ruid=1000, euid=0, suid=1000 does execve() on a setuid
> 1000 binary? I think we probably don't specifically care about that...
>
> I think another scenario that we ideally might want to care about is
> what happens if a process which runs with a normal user's UIDs, but is
> non-dumpable, goes through execve() of a normal binary while another
> process tries to inspect its FDs or address space layout - it probably
> shouldn't be possible to get information about the pre-execve MM and
> O_CLOEXEC file descriptors.
>
> > Question 4.
> > Is it possible to use a seq_lock instead of reader writer semaphore?
> > Or is that only for non-sleeping readers?
>
> Linux seqcounts are 32-bit, which means they are always kind of dodgy,
> but they are particularly dodgy if a reader can be forced to sleep for
> an extended amount of time. I don't see a reason why we couldn't, in
> general, use a 64-bit sequence count for readers that may need to
> sleep while reading.

I have a patch series for this that I started working after merging your
series for precisely this reason: performance. It's a few days old now.
I've tried various approaches and I started with a simple 32-bit counter
as the POC. See appended (untested) patches.