Re: [PATCH] fhandle: fix UAF due to unlocked ->mnt_ns read in may_decode_fh()

From: Al Viro

Date: Wed Jun 03 2026 - 14:42:33 EST


On Wed, Jun 03, 2026 at 08:23:44PM +0200, Jann Horn wrote:
> On Wed, Jun 3, 2026 at 8:15 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> > On Wed, Jun 03, 2026 at 07:38:06PM +0200, Jann Horn wrote:
> > > Fix it by taking rcu_read_lock() around the mount::mnt_ns access, like
> > > in __prepend_path().
> >
> > > + /*
> > > + * Containing namespace.
> > > + * Normally protected by namespace_sem, but there are also lockless
> > > + * readers (which must use RCU to guard against the namespace being
> > > + * freed).
> > > + */
> > > + struct mnt_namespace *mnt_ns;
> >
> > Umm... It's somewhat subtle - at the very least you need to explain why
> > there will be an RCU delay between umount_tree() clearing that and
> > having the sucker freed.
>
> I guess I could write something like this instead, to make it clear
> that this basically follows normal RCU rules, except that this code
> isn't actually using RCU markings and accessors?
>
> "This is like an __rcu pointer which is protected by RCU and
> namespace_sem; however, because most accesses happen under
> namespace_sem, it is not marked as __rcu, and RCU access is done with
> READ_ONCE()."
>
> Or we could put __rcu on this pointer, and annotate all the locked
> accesses with rcu_dereference_protected(...,
> lockdep_is_held(&namespace_lock)), but I guess you'd probably prefer
> to not do that?

Not the point... What I'm talking about is the reason why RCU access
to ->mnt_ns is valid in the first place - prompt freeing of namespace
instances *is* possible; we do have a guaranteed RCU delay between
zeroing ->mnt_ns and having the instance it pointed to freed, but it's
not instantly obvious where to look for it.

Basically, the store that cleared ->mnt_ns has been done in namespace_sem
scope and that scope is either no later than the scope in put_mnt_ns()
that has dropped the active refcount of ns to zero. At the beginning
of that scope in put_mnt_ns() we are guaranteed to have the passive
refcount positive. Dropping the passive reference happens after an
rcu delay started in later in the same namespace_sem scope and namespace
is not freed until the passive refcount reaches zero.