Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files

From: yangerkun

Date: Thu May 07 2026 - 23:08:33 EST

在 2026/5/8 10:45, yangerkun 写道:

Hello Chuck,

在 2026/5/8 0:12, Chuck Lever 写道:

Hello Erkun -

On Thu, May 7, 2026, at 11:09 AM, yangerkun wrote:

Hi,

在 2026/5/1 22:51, Chuck Lever 写道:

Misbah Anjum reported a use-after-free in cache_check_rcu()
reached through e_show() while sosreport was reading
/proc/fs/nfsd/exports on ppc64le. Two fixes for that report
landed in v7.0:

    48db892356d6 ("NFSD: Defer sub-object cleanup in export put callbacks")
    e7fcf179b82d ("NFSD: Hold net reference for the lifetime of / proc/fs/nfs/exports fd")

Back to the problem fixed by this patches, I'm a little confused why
this UAF can be trigged.

Before this patches, svc_export_put show as follow:

   368 static void svc_export_put(struct kref *ref)
   369 {
   370         struct svc_export *exp = container_of(ref, struct
svc_export, h.ref);
   371
   372         path_put(&exp->ex_path);
   373         auth_domain_put(exp->ex_client);
   374         call_rcu(&exp->ex_rcu, svc_export_release);
   375 }

The auth_domain_put function releases ->name using call_rcu, and
path_put may release the dentry also via call_rcu. All of this seems to
prevent e_show from causing a UAF. Could you point out which line in
d_path triggers the issue?

The dentry, the mount, and the auth_domain ->name buffer all
end up RCU-freed (dentry_free() and delayed_free_vfsmnt in
fs/, svcauth_unix_domain_release_rcu() in svcauth_unix.c).
The eventual kfree isn't the problem.

The problem is the synchronous teardown inside path_put(),
which runs before svc_export_put() ever reaches its own
call_rcu():

   path_put(&exp->ex_path)
     -> dput(dentry)
        -> __dentry_kill()              [if last ref]
           -> __d_drop()                /* unhashes */
           -> dentry_unlink_inode()     /* d_inode = NULL */
           -> d_op->d_release() if set
           -> drops parent d_lockref    /* may cascade up */
           -> dentry_free()             /* call_rcu deferred */
     -> mntput(mnt)                     /* deferred via task_work */

The dentry pointer itself is RCU-safe, so prepend_path()'s walk
of d_parent and d_name doesn't read freed memory. But by the
time the reader gets there, __d_clear_type_and_inode() has
already stored NULL into d_inode, __d_drop() has broken the
hash linkage, and the parent's d_lockref has been decremented
-- which can in turn fire __dentry_kill() on the parent, and
on up the tree. An e_show() that's still inside its cache RCU
read section walks into that half-dismantled state through
seq_path(), and that's the NULL deref Misbah reported.

Thank you for your detailed explanation! Yes, e_show might be called when the state is partially dismantled, but after carefully reviewing the code with dput up to __dentry_kill, I still cannot find anything that could cause this issue. Additionally, the comments for prepend_path indicate that they have already taken into account that the dentry can be removed concurrently. I have also run some tests on my arm64 QEMU, but I couldn't reproduce the problem either. Could you please help me identify the specific line or pointer in the dentry that triggers this use-after-free or null pointer issue?

Maybe I am not be very familiar with the code, which caused me to fail to identify the real root cause. I'm so sorry for that.

265 char *d_path(const struct path *path, char *buf, int buflen)
266 {
267         DECLARE_BUFFER(b, buf, buflen);
268         struct path root;
269
270         /*
271          * We have various synthetic filesystems that never get mounted. On
272          * these filesystems dentries are never used for lookup purposes, and
273          * thus don't need to be hashed. They also don't need a name until a
274          * user wants to identify the object in /proc/pid/fd/. The little hack
275          * below allows us to generate a name for these objects on demand:
276          *
277          * Some pseudo inodes are mountable. When they are mounted
278          * path->dentry == path->mnt->mnt_root. In that case don't call d_dname
279          * and instead have d_path return the mounted path.
280          */
281         if (path->dentry->d_op && path->dentry->d_op->d_dname &&
282             (!IS_ROOT(path->dentry) || path->dentry != path->mnt- >mnt_root))
283                 return path->dentry->d_op->d_dname(path->dentry, buf, buflen);
284
285         rcu_read_lock();
286         get_fs_root_rcu(current->fs, &root);
287         if (unlikely(d_unlinked(path->dentry)))
288                 prepend(&b, " (deleted)", 11);
289         else
290                 prepend_char(&b, 0);
291         prepend_path(path, &root, &b);
292         rcu_read_unlock();
293
294         return extract_string(&b);
295 }

The earlier fix (2530766492ec, "nfsd: fix UAF when access
ex_uuid or ex_stats") moved the kfree of ex_uuid and ex_stats
into svc_export_release() so those are RCU-safe now.
path_put() and auth_domain_put() couldn't go in there because
both may sleep, and call_rcu callbacks run in softirq context.
This series uses queue_rcu_work() instead: it defers past the
grace period AND runs the callback in process context, so the
sleeping puts move into the deferred path and the window
closes.

Yeah, I can get this! Thanks again for your detail explanation!

Also, could the scenario described in this commit be triggered again?

commit 69d803c40edeaf94089fbc8751c9b746cdc35044
Author: Yang Erkun <yangerkun@xxxxxxxxxx>
Date: Mon Dec 16 22:21:52 2024 +0800

nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"

This reverts commit f8c989a0c89a75d30f899a7cabdc14d72522bb8d.

Before this commit, svc_export_put or expkey_put will call path_put with
sync mode. After this commit, path_put will be called with async mode.
And this can lead the unexpected results show as follow.

mkfs.xfs -f /dev/sda
echo "/ *(rw,no_root_squash,fsid=0)" > /etc/exports
echo "/mnt *(rw,no_root_squash,fsid=1)" >> /etc/exports
exportfs -ra
service nfs-server start
mount -t nfs -o vers=4.0 127.0.0.1:/mnt /mnt1
mount /dev/sda /mnt/sda
touch /mnt1/sda/file
exportfs -r
umount /mnt/sda # failed unexcepted

The touch will finally call nfsd_cross_mnt, add refcount to mount, and
then add cache_head. Before this commit, exportfs -r will call
cache_flush to cleanup all cache_head, and path_put in
svc_export_put/expkey_put will be finished with sync mode. So, the
latter umount will always success. However, after this commit, path_put
will be called with async mode, the latter umount may failed, and if
we add some delay, umount will success too. Personally I think this bug
and should be fixed. We first revert before bugfix patch, and then fix
the original bug with a different way.

Fixes: f8c989a0c89a ("nfsd: release svc_expkey/svc_export with rcu_work")
Signed-off-by: Yang Erkun <yangerkun@xxxxxxxxxx>
Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>

Thanks,
Erkun.