Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU

From: Johannes Weiner
Date: Tue May 12 2020 - 17:30:01 EST


On Tue, Feb 11, 2020 at 12:55:07PM -0500, Johannes Weiner wrote:
> The VFS inode shrinker is currently allowed to reclaim inodes with
> populated page cache. As a result it can drop gigabytes of hot and
> active page cache on the floor without consulting the VM (recorded as
> "inodesteal" events in /proc/vmstat).

I'm sending a rebased version of this patch.

We've been running with this change in the Facebook fleet since
February with no ill side effects observed.

However, I just spent several hours chasing a mysterious reclaim
problem that turned out to be this bug again on an unpatched system.

In the scenario I was debugging, the problem wasn't that we were
losing cache, but that we were losing the non-resident information for
previously evicted cache.

I understood the file set enough to know it was thrashing like crazy,
but it didn't register as refaults to the kernel. Without detecting
the refaults, reclaim wouldn't start swapping to relieve the
struggling cache (plenty of cold anon memory around). It also meant
the IO delays of those refaults didn't contribute to memory pressure
in psi, which made userspace blind to the situation as well.

The first aspect means we can get stuck in pathological thrashing, the
second means userspace OOM detection breaks and we can leave servers
(or Android devices, for that matter) hopelessly livelocked.

New patch attached below. I hope we can get this fixed in 5.8, it's
really quite a big hole in our cache management strategy.

---