Re: [RFC PATCH] vfs: limit directory child dentry retention
From: Mateusz Guzik
Date: Tue Mar 31 2026 - 11:05:16 EST
On Tue, Mar 31, 2026 at 05:54:01PM +0800, Gao Xiang wrote:
> JFYI, another issue we once observed on user workloads is that
>
> `d_lockref.count` can exceed `int` on very very large
> directories in reality (also combined with cached
> negative dentries).
>
> It can be a real overflow, this commit can help but it
> doesn't strictly resolve this, anyway.
Another way to contribute to the problem is to mass open the same file,
which results in one ref per fd.
Or to put it differently, sooner or later the dentry refcount will have
to switch to 64 bits on 64 bit systems.
There are 2 issues with it that I see:
1. no space
struct dentry is 192 bytes in size without any holes so growing is an
eyebrow-raiser.
space can be freed by either lowering the size of shortname_store 40 ->
32 bytes or converting d_hash linkage to be single-linked. The latter
means hash removals turn O(n) from O(1), but that very traversal is
already there to find the dentry during lookup. Thus if it constitutes a
problem, things are already bad in the sizing or hashing department.
2. lockref itself
If one was to implement a lockref variant with an 8 byte count and a 4
byte spinlock, one would need to use 16 byte atomics and that's
atrocious af performance wise.
Perhaps it would be feasible to hack the lock as a bit in the count, but
I don't think that's warranted.
The good news here is that lockref is already a performance problem
because of cmpxchg loops on both sides of ref/unref and AFAICS there is
a perfectly sensible way to move away from it.
Mandatory remark that numerous commonly syscalls can avoid the ref trip
in the common case, but getting there requires a lot of rototoiling in
LSM code.
So the fastest thing would lock xadd on both sides of course, but going
that far from the get go is asking for trouble because of baked in
assumptions about no transitions 0->1 when dlock is held.
Instead, a state which is already way faster than the current thing would
"lock cmpxchg" to grab the ref and "lock xadd" to release it, with a
dedicated bit spent to temporarily block lockless operation on ref side
(any place which wants to keep the ref at 0 would have to issue an
atomic to freeze it and then the current guarantee is provided).
There is no significant difficulty here as far as complexity goes, but
there is a lot of prerequisite churn to go through -- lockref use is
open-coded all over and the count access is inconsistently either doing
a raw load or going through d_count().
I had a WIP patch to do it, but other churn-ey changes in dcache mean it
needs to be redone from scratch.
Maybe I'll get around to doing it.