On Wed, 22 Jan 2025, Jeff Layton wrote:At the beginning, our testing team discovered this issue on 5.10 using
On Wed, 2025-01-15 at 10:03 -0500, Chuck Lever wrote:I'm not convinced this is the right approach.
On 1/14/25 2:39 PM, Jeff Layton wrote:To be clear, I think we need to drop e57420be100ab from your nfsd-
On Tue, 2025-01-14 at 14:27 -0500, Jeff Layton wrote:Holding the RCU read lock will keep the dereferences safe since
On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:Actually, I take it back. This is problematic in another way.
In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lruI think this looks OK. Filecache bugs are particularly nasty though, so
list, gc may be triggered in another thread and immediately release this
nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
All the places where unhash is done will also perform lru_remove, so there
is no need to do lru_remove separately here. After inserting the nfsd_file
into the nfsd_file_lru list, it can be released by relying on gc.
Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
Signed-off-by: Li Lingfeng <lilingfeng3@xxxxxxxxxx>
---
fs/nfsd/filecache.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a1cdba42c4fa..37b65cb1579a 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
/* Try to add it to the LRU. If that fails, decrement. */
if (nfsd_file_lru_add(nf)) {
/* If it's still hashed, we're done */
- if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+ if (list_lru_count(&nfsd_file_lru))
nfsd_file_schedule_laundrette();
- return;
- }
- /*
- * We're racing with unhashing, so try to remove it from
- * the LRU. If removal fails, then someone else already
- * has our reference.
- */
- if (!nfsd_file_lru_remove(nf))
- return;
+ return;
}
}
if (refcount_dec_and_test(&nf->nf_ref))
let's run this through a nice long testing cycle.
Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
In this case, we're racing with another task that is unhashing the
object, but we've put it on the LRU ourselves. What guarantee do we
have that the unhashing and removal from the LRU didn't occur before
this task called nfsd_file_lru_add()? That's why we attempt to remove
it here -- we can't rely on the task that unhashed it to do so at that
point.
What might be best is to take and hold the rcu_read_lock() before doing
the nfsd_file_lru_add, and just release it after we do these racy
checks. That should make it safe to access the object.
Thoughts?
nfsd_file objects are freed only after an RCU grace period. But will the
logic in nfsd_file_put() work properly on totally dead nfsd_file
objects? I don't see a specific failure mode there, but I'm not very
imaginative.
Overall, I think RCU would help.
testing branch. The race I identified above is quite likely to occur
and could lead to leaks.
If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
think the RCU approach is safe.
I cannot see how nfsd_file_put() can race with unhashing. If it cannot
then we can simply unconditionally call nfsd_file_schedule_laundrette().
Can describe how the race can happen - if indeed it can.
Note that we also need rcu protection in nfsd_file_lru_add() so that the
nf doesn't get freed after it is added the the lru and before the trace
point. If we don't end up needing rcu round the call, we will need it
in the call.
Thanks,
NeilBrown