Re: [PATCH] eventfs: Have inodes have unique inode numbers

From: Linus Torvalds
Date: Sun Jan 28 2024 - 16:43:35 EST


On Sun, 28 Jan 2024 at 13:19, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> The deleting of the ei is done outside the VFS logic.

No.

You're fundamentally doing it wrong.

What you call "deletion" is just "remove from my hashes" or whatever.

The lifetime of the object remains entirely unrelated to that. It is
not free'd - removing it from the hashes should just be a reference
counter decrement.

> I use SRCU to synchronize looking at the ei children in the lookup.

That's just wrong.

Either you look things up under your own locks, in which case the SRCU
dance is unnecessary and pointless.

Or you use refcounts.

In which case SRCU is also unnecessary and pointless.

> On deletion, I
> grab the eventfs_mutex, set ei->is_freed and then wait for SRCU to
> finish before freeing.

Again, bogus.

Sure, you could do is "set ei->is_freed" to let any other users know
(if they even care - why would they?). You'd use *locking* to
serialize that.

btu that has *NOTHING* to do with actually freing the data structure,
and it has nothing to do with S$RCU - even if the locking might be
blocking.

Because *after* you have changed your data structures, and prefereably
after you have already dropped your locks (to not hold them
unnecessarily over any memory management) then you just do the normal
"free the reference count", because you've removed the ref from your
own data structures.

You don't use "use SRCU before freeing". You use the pattern I showed:

if (atomic_dec_and_test(&entry->refcount))
rcu_free(entry);

in a "put_entry()" function, and EVERYBODY uses that function when
they are done with it.

In fact, the "rcu_free()" is likely entirely unnecessary, since I
don't see that you ever look anything up under RCU.

If all your lookups are done under the eventfs_mutex lock you have, just do

if (atomic_dec_and_test(&entry->refcount))
kfree(entry);

and you're done. By definition, once the refcount goes down to zero,
there are no users, and if all your own data structures are maintained
with a lock, there is never ever any reason to use a RCU delay.

Sure, you'll have things like "dentry->d_fsdata" accesses that happen
before you even take the lock, but that's fine - the d_fsdata pointer
has a ref to it, so there's no locking needed for that lookup. It's
just a direct pointer dereference, and it's protected by the refcount.

No special cases. The user that sets "is_freed" is not special. Never
will be. It's just one reference among many others, and YOU DO NOT
CONTROL THE OTHER REFERENCES.

If you've given a ref to dentry->d_fsdata, it's no longer yours to
mess around with. All you can do is wait for the dentry to go away, at
which point you do the same "put_dentry()" because exactly like your
own data structures, it's JUST ANOTHER REFERENCE.

See what I'm saying?

Linus