Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU
From: Johannes Weiner
Date: Wed Feb 12 2020 - 11:42:39 EST
On Wed, Feb 12, 2020 at 08:25:45PM +0800, Yafang Shao wrote:
> On Wed, Feb 12, 2020 at 1:55 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > Another variant of this problem was recently observed, where the
> > kernel violates cgroups' memory.low protection settings and reclaims
> > page cache way beyond the configured thresholds. It was followed by a
> > proposal of a modified form of the reverted commit above, that
> > implements memory.low-sensitive shrinker skipping over populated
> > inodes on the LRU [1]. However, this proposal continues to run the
> > risk of attracting disproportionate reclaim pressure to a pool of
> > still-used inodes,
>
> Hi Johannes,
>
> If you really think that is a risk, what about bellow additional patch
> to fix this risk ?
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 80dddbc..61862d9 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -760,7 +760,7 @@ static bool memcg_can_reclaim_inode(struct inode *inode,
> goto out;
>
> cgroup_size = mem_cgroup_size(memcg);
> - if (inode->i_data.nrpages + protection >= cgroup_size)
> + if (inode->i_data.nrpages)
> reclaimable = false;
>
> out:
>
> With this additional patch, we skip all inodes in this memcg until all
> its page cache pages are reclaimed.
Well that's something we've tried and had to revert because it caused
issues in slab reclaim. See the History part of my changelog.
> > while not addressing the more generic reclaim
> > inversion problem outside of a very specific cgroup application.
> >
>
> But I have a different understanding. This method works like a
> knob. If you really care about your workingset (data), you should
> turn it on (i.e. by using memcg protection to protect them), while
> if you don't care about your workingset (data) then you'd better
> turn it off. That would be more flexible. Regaring your case in the
> commit log, why not protect your linux git tree with memcg
> protection ?
I can't imagine a scenario where I *wouldn't* care about my
workingset, though. Why should it be opt-in, not the default?