Re: [PATCH 1/2] vfs: Add hooks for filesystem-specific prune_icache_sb

From: Dave Chinner
Date: Mon Jun 27 2016 - 21:12:29 EST


On Fri, Jun 24, 2016 at 02:50:10PM -0500, Bob Peterson wrote:
> This patch adds filesystem-specific callbacks for shrinking the
> inode cache, prune_icache_sb. This is provided for filesystems in
> which the default VFS prune_icache_sb needs to be delayed due to
> filesystem locking requirements not met by vfs.
>
> Signed-off-by: Bob Peterson <rpeterso@xxxxxxxxxx>
> ---
> Documentation/filesystems/vfs.txt | 15 +++++++++++++++
> fs/inode.c | 1 +
> fs/super.c | 5 ++++-
> include/linux/fs.h | 3 +++
> 4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index c61a223..7cb4c5c 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -230,6 +230,7 @@ struct super_operations {
> ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
> int (*nr_cached_objects)(struct super_block *);
> void (*free_cached_objects)(struct super_block *, int);
> + long (*prune_icache_sb)(struct super_block *, struct shrink_control *);
> };

IIRC, I originally proposed this before adding the
nr_cached_objects/free_cached_objects interface because it allowed
filesystems direct control over inode reclaim. I don't recall the
reasons for it being rejected now - I think the main one was that
the inode life cycle was owned by the VFS, not the filesystem, and
that it woul dbe too easy for people to implement their own
shrinkers and get it all wrong and hence screw over the entire
system....

> All methods are called without any locks being held, unless otherwise
> @@ -319,6 +320,20 @@ or bottom half).
> implementations will cause holdoff problems due to large scan batch
> sizes.
>
> + prune_icache_sb: called by the sb cache shrinking function for the file
> + filesystem to reduce the number of inodes from slab. This is done to
> + accomodate file systems that may not be able to immediately remove
> + inodes from cache, but must queue them to be removed ASAP.
> +
> + This can happen in GFS2, for example, where evicting an inode
> + may require an inter-node lock (glock) which makes a call into DLM
> + (distributed lock manager), which may block for any number of reasons.
> + For example, it may block because a customer node needs to be fenced,
> + so the lock cannot be granted until the fencing is complete.
> + The fencing, in turn, may be blocked for other reasons, such as
> + memory allocations that caused the shrinker to be called in the first
> + place. Optional. If not set, the default vfs prune_icache_sb is called.

So why don't you simply do what XFS does? i.e. when ->evict/destroy is
called on an inode, you clean up the VFS portion of the inode but do
not destroy/free it - you simply queue it to an internal list and
then do the cleanup/freeing in your own time?

i.e. why do you need a special callout just to defer freeing to
another thread when we already have hooks than enable you to do
this?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx