Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU

From: Yafang Shao
Date: Wed Feb 12 2020 - 20:48:08 EST


On Thu, Feb 13, 2020 at 12:42 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Wed, Feb 12, 2020 at 08:25:45PM +0800, Yafang Shao wrote:
> > On Wed, Feb 12, 2020 at 1:55 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > > Another variant of this problem was recently observed, where the
> > > kernel violates cgroups' memory.low protection settings and reclaims
> > > page cache way beyond the configured thresholds. It was followed by a
> > > proposal of a modified form of the reverted commit above, that
> > > implements memory.low-sensitive shrinker skipping over populated
> > > inodes on the LRU [1]. However, this proposal continues to run the
> > > risk of attracting disproportionate reclaim pressure to a pool of
> > > still-used inodes,
> >
> > Hi Johannes,
> >
> > If you really think that is a risk, what about bellow additional patch
> > to fix this risk ?
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 80dddbc..61862d9 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -760,7 +760,7 @@ static bool memcg_can_reclaim_inode(struct inode *inode,
> > goto out;
> >
> > cgroup_size = mem_cgroup_size(memcg);
> > - if (inode->i_data.nrpages + protection >= cgroup_size)
> > + if (inode->i_data.nrpages)
> > reclaimable = false;
> >
> > out:
> >
> > With this additional patch, we skip all inodes in this memcg until all
> > its page cache pages are reclaimed.
>
> Well that's something we've tried and had to revert because it caused
> issues in slab reclaim. See the History part of my changelog.
>

You misuderstood it.
The reverted patch skips all inodes in the system, while this patch
only works when you turn on memcg.{min, low} protection.
IOW, that is not a default behavior, while it only works when you want
it and only effect your targeted memcg rather than the whole system.

> > > while not addressing the more generic reclaim
> > > inversion problem outside of a very specific cgroup application.
> > >
> >
> > But I have a different understanding. This method works like a
> > knob. If you really care about your workingset (data), you should
> > turn it on (i.e. by using memcg protection to protect them), while
> > if you don't care about your workingset (data) then you'd better
> > turn it off. That would be more flexible. Regaring your case in the
> > commit log, why not protect your linux git tree with memcg
> > protection ?
>
> I can't imagine a scenario where I *wouldn't* care about my
> workingset, though. Why should it be opt-in, not the default?

Because the default behavior has caused the XFS performace hit.
(I haven't checked your patch carefully, so I don't know whehter your
patch fix it yet.)


Thanks

Yafang