Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU

From: Johannes Weiner
Date: Wed Feb 12 2020 - 13:52:25 EST


On Wed, Feb 12, 2020 at 10:26:45AM -0800, Andrew Morton wrote:
> On Wed, 12 Feb 2020 11:35:40 -0500 Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> > Since the cache purging code was written for highmem scenarios, how
> > about making it specific to CONFIG_HIGHMEM at least?
>
> Why do I have memories of suggesting this a couple of weeks ago ;)

Sorry, you did. I went back and found your email now. It completely
slipped my mind after that thread went off into another direction.

> > That way we improve the situation for the more common setups, without
> > regressing highmem configurations. And if somebody wanted to improve
> > the CONFIG_HIGHMEM behavior as well, they could still do so.
> >
> > Somethig like the below delta on top of my patch?
>
> Does it need to be that complicated? What's wrong with
>
> --- a/fs/inode.c~a
> +++ a/fs/inode.c
> @@ -761,6 +761,10 @@ static enum lru_status inode_lru_isolate
> return LRU_ROTATE;
> }
>
> +#ifdef CONFIG_HIGHMEM
> + /*
> + * lengthy blah
> + */
> if (inode_has_buffers(inode) || inode->i_data.nrpages) {
> __iget(inode);
> spin_unlock(&inode->i_lock);
> @@ -779,6 +783,7 @@ static enum lru_status inode_lru_isolate
> spin_lock(lru_lock);
> return LRU_RETRY;
> }
> +#endif

Pages can show up here even under !CONFIG_HIGHMEM. Because of the lock
order to maintain LRU state (i_lock -> xa_lock), when the page cache
inserts new pages it doesn't unlink the inode from the LRU atomically,
and the shrinker might get here before inode_pages_set(). In that case
we need the shrinker to punt the inode off the LRU (the #else branch).

> WARN_ON(inode->i_state & I_NEW);
> inode->i_state |= I_FREEING;
> _
>
> Whatever we do will need plenty of testing. It wouldn't surprise me
> if there are people who unknowingly benefit from this code on
> 64-bit machines.

If we agree this is the way to go, I can put the patch into our tree
and gather data from the Facebook fleet before we merge it.