Re: [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem

From: Amir Goldstein
Date: Fri Oct 11 2024 - 07:45:18 EST


On Thu, Oct 10, 2024 at 7:04 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Thu 10-10-24 09:35:46, Benjamin Coddington wrote:
> > On 10 Oct 2024, at 8:16, Jan Kara wrote:
> >
> > > On Thu 10-10-24 19:25:42, Ye Bin wrote:
> > >> From: Ye Bin <yebin10@xxxxxxxxxx>
> > >>
> > >> In order to better analyze the issue of file system uninstallation caused
> > >> by kernel module opening files, it is necessary to perform dentry recycling
> > >
> > > I don't quite understand the use case you mention here. Can you explain it
> > > a bit more (that being said I've needed dropping caches for a particular sb
> > > myself a few times for debugging purposes so I generally agree it is a
> > > useful feature).
> > >
> > >> on a single file system. But now, apart from global dentry recycling, it is
> > >> not supported to do dentry recycling on a single file system separately.
> > >> This feature has usage scenarios in problem localization scenarios.At the
> > >> same time, it also provides users with a slightly fine-grained
> > >> pagecache/entry recycling mechanism.
> > >> This patch supports the recycling of pagecache/entry for individual file
> > >> systems.
> > >>
> > >> Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>
> > >> ---
> > >> fs/drop_caches.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> > >> include/linux/mm.h | 2 ++
> > >> kernel/sysctl.c | 9 +++++++++
> > >> 3 files changed, 54 insertions(+)
> > >>
> > >> diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> > >> index d45ef541d848..99d412cf3e52 100644
> > >> --- a/fs/drop_caches.c
> > >> +++ b/fs/drop_caches.c
> > >> @@ -77,3 +77,46 @@ int drop_caches_sysctl_handler(const struct ctl_table *table, int write,
> > >> }
> > >> return 0;
> > >> }
> > >> +
> > >> +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write,
> > >> + void *buffer, size_t *length, loff_t *ppos)
> > >> +{
> > >> + unsigned int major, minor;
> > >> + unsigned int ctl;
> > >> + struct super_block *sb;
> > >> + static int stfu;
> > >> +
> > >> + if (!write)
> > >> + return 0;
> > >> +
> > >> + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3)
> > >> + return -EINVAL;
> > >
> > > I think specifying bdev major & minor number is not a great interface these
> > > days. In particular for filesystems which are not bdev based such as NFS. I
> > > think specifying path to some file/dir in the filesystem is nicer and you
> > > can easily resolve that to sb here as well.
> >
> > Slight disagreement here since NFS uses set_anon_super() and major:minor
> > will work fine with it.
>
> OK, fair point, anon bdev numbers can be used. But filesystems using
> get_tree_nodev() would still be problematic.
>
> > I'd prefer it actually since it avoids this
> > interface having to do a pathwalk and make decisions about what's mounted
> > where and in what namespace.
>
> I don't understand the problem here. We'd do user_path_at(AT_FDCWD, ...,
> &path) and then take path.mnt->mnt_sb. That doesn't look terribly
> complicated to me. Plus it naturally deals with issues like namespacing
> etc. although they are not a huge issue here because the functionality
> should be restricted to CAP_SYS_ADMIN anyway.
>

Both looking up bdev and looking up path from write() can make syzbot
and lockdep very upset:
https://lore.kernel.org/linux-fsdevel/00000000000098f75506153551a1@xxxxxxxxxx/

I thought Christian had a proposal for dropping cache per-sb API via fadvise()
or something?

Why use sysfs API for this and not fd to reference an sb?

Thanks,
Amir.