Re: [PATCH] eventfs: Have inodes have unique inode numbers
From: Steven Rostedt
Date: Mon Jan 29 2024 - 04:34:48 EST
On Mon, 29 Jan 2024 08:44:29 +0200
Amir Goldstein <amir73il@xxxxxxxxx> wrote:
Hi Amir,
[ Suffering a bit of insomnia, I made the mistake of reading email from
bed, and now I have to reply :-p ]
> >
> > And I do it just like debugfs when it deletes files outside of VFS or
> > procfs, and pretty much most virtual file systems.
> >
>
> I think it is better if we used the term "pseudo" file systems, because
> to me VFS already stands for "virtual file system".
Oops, you are absolutely correct. I meant "pseudo" but somehow switch
to saying "virtual". :-p
> >
> > Sorry, but you didn't prove your point. The example you gave me is
> > already covered. Please tell me when a kprobe goes away, how do I let
> > VFS know? Currently the tracing code (like kprobes and synthetic
> > events) calls eventfs_remove_dir() with just a reference to that ei
> > eventfs_inode structure. I currently use the ei->dentry to tell VFS
> > "this directory is being deleted". What other means do I have to
> > accomplish the same thing?
> >
>
> Would kernfs_node_dentry() have been useful in that case?
> Just looking at kernfs_node and eventfs_inode gives a very strong
> smell of reinventing.
Close, but looking at kernfs real quick, I think I see the difference
and why eventfs relies on the dentry, and why it doesn't need to.
>
> Note that the fact that eventfs has no rename makes the whole dentry
> business a lot less complicated than it is in the general VFS case -
> IIUC, an eventfs path either exists or it does not exist, but if it exists,
> it conceptually always refers to the same underlying object (kprobe).
>
> I am not claiming that kernfs can do everything that eventfs needs -
> I don't know, because I did not look into it.
eventfs has one other major simplicity that kernfs does not. Not only
does it not have renames, but files are not created or deleted
individually. That is, when a directory is created, it will have a
fixed number of files. It will have those same files until the
directory is deleted.
That means I have no meta data saving anything about the files, except
a fixed array that that holds only a name and a callback for each file
in that directory.
>
> But the concept of detaching the pseudo filesystem backing objects
> from the vfs objects was already invented once and Greg has also
> said the same thing.
>
> My *feeling* at this point is that the best course of action is to use
> kernfs and to improve kernfs to meet eventfs needs, if anything needs
> improving at all.
>
> IMO, the length and content of this correspondence in itself
> is proof that virtual file systems should use an abstraction that
> is much less flexible than the VFS.
>
> Think of it this way - kernefs and VFS are both under-documented,
> but the chances of getting kernfs well documented are far better
> than ever being able to document all the subtleties of VFS for mortals.
>
> IOW, I would be happy if instead of the LSFMM topic
> "Making pseudo file systems inodes/dentries more like normal file systems"
> We would be talking about
> "Best practices for writing a pseudo filesystem" and/or
> "Missing kernfs functionality for writing pseudo filesystems"
I'm fine with that, and I think I also understand how to solve the
issue with Linus here.
Linus hates the fact that eventfs_inode has a reference to the dentry.
The reason I do that is because when the dentry is created, it happens
in lookup with a given file name. The registered callback for the file
name will return back a "data" pointer that gets assigned to the
dentry's inode->i_private data. This gets returned directly to the
open function calls.
That is, the kprobe will register a handle that gets assigned to the
inode->i_private data just like it did when it was in debugfs. Then the
open() call would get the kprode data via the inode->i_private.
This is why reference counters do not work here. If I don't make the
dentry and inode go away after the last reference, it is possible to
call that open function with the i_private data directly.
I noticed that kernfs assigns the kernfs_inode to the i_private data.
Thus it has some control over what's in the i_private.
I use that simple_recursive_removal() to clear out any inodes that have
a dentry ref count of zero so that there will be no inode->i_private
references in the open.
I'll have to look more at kernfs to see how it works. Perhaps it can
give me some ideas on not having to use dentry.
Hmm, I may have been looking at this all wrong. I don't need the
dentry. I need the inode! Instead of keeping a pointer to the dentry in
the eventfs_inode, I should just be keeping a pointer to the inode. As
then I could remove the inode->i_private to NULL on the last reference.
The open() calls will have to check for NULL in that case.
Are inodes part of the VFS system like dentry is?
I would think not as tracefs has a "container_of" around the inode to
get to the tracefs_inode, just like many other file systems do.
That would remove my need to have to keep around any dentry.
I first need to know why I'm seeing "ghost" files. That is without that
simple_recursive_removal() call, I get:
# echo 'p:sched schedule' >> /sys/kernel/tracing/kprobe_events
# echo 'p:timer read_current_timer' >> kprobe_events
# ls events/kprobes/
enable filter sched timer
# cat events/kprobes/sched/enable
0
# ls events/kprobes/sched
enable filter format hist hist_debug id inject trigger
# echo '-:sched schedule' >> /sys/kernel/tracing/kprobe_events
# ls events/kprobes
enable filter timer
# ls events/kprobes/sched/enable
events/kprobes/sched/enable
??
The sched directory has been deleted but I can still "ls" the files
within it.
-- Steve