Re: [PATCH 2/3] sysctl: add support for drop_caches for individual filesystem

From: Jan Kara
Date: Thu Oct 10 2024 - 13:04:19 EST


On Thu 10-10-24 09:35:46, Benjamin Coddington wrote:
> On 10 Oct 2024, at 8:16, Jan Kara wrote:
>
> > On Thu 10-10-24 19:25:42, Ye Bin wrote:
> >> From: Ye Bin <yebin10@xxxxxxxxxx>
> >>
> >> In order to better analyze the issue of file system uninstallation caused
> >> by kernel module opening files, it is necessary to perform dentry recycling
> >
> > I don't quite understand the use case you mention here. Can you explain it
> > a bit more (that being said I've needed dropping caches for a particular sb
> > myself a few times for debugging purposes so I generally agree it is a
> > useful feature).
> >
> >> on a single file system. But now, apart from global dentry recycling, it is
> >> not supported to do dentry recycling on a single file system separately.
> >> This feature has usage scenarios in problem localization scenarios.At the
> >> same time, it also provides users with a slightly fine-grained
> >> pagecache/entry recycling mechanism.
> >> This patch supports the recycling of pagecache/entry for individual file
> >> systems.
> >>
> >> Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>
> >> ---
> >> fs/drop_caches.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> >> include/linux/mm.h | 2 ++
> >> kernel/sysctl.c | 9 +++++++++
> >> 3 files changed, 54 insertions(+)
> >>
> >> diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> >> index d45ef541d848..99d412cf3e52 100644
> >> --- a/fs/drop_caches.c
> >> +++ b/fs/drop_caches.c
> >> @@ -77,3 +77,46 @@ int drop_caches_sysctl_handler(const struct ctl_table *table, int write,
> >> }
> >> return 0;
> >> }
> >> +
> >> +int drop_fs_caches_sysctl_handler(const struct ctl_table *table, int write,
> >> + void *buffer, size_t *length, loff_t *ppos)
> >> +{
> >> + unsigned int major, minor;
> >> + unsigned int ctl;
> >> + struct super_block *sb;
> >> + static int stfu;
> >> +
> >> + if (!write)
> >> + return 0;
> >> +
> >> + if (sscanf(buffer, "%u:%u:%u", &major, &minor, &ctl) != 3)
> >> + return -EINVAL;
> >
> > I think specifying bdev major & minor number is not a great interface these
> > days. In particular for filesystems which are not bdev based such as NFS. I
> > think specifying path to some file/dir in the filesystem is nicer and you
> > can easily resolve that to sb here as well.
>
> Slight disagreement here since NFS uses set_anon_super() and major:minor
> will work fine with it.

OK, fair point, anon bdev numbers can be used. But filesystems using
get_tree_nodev() would still be problematic.

> I'd prefer it actually since it avoids this
> interface having to do a pathwalk and make decisions about what's mounted
> where and in what namespace.

I don't understand the problem here. We'd do user_path_at(AT_FDCWD, ...,
&path) and then take path.mnt->mnt_sb. That doesn't look terribly
complicated to me. Plus it naturally deals with issues like namespacing
etc. although they are not a huge issue here because the functionality
should be restricted to CAP_SYS_ADMIN anyway.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR