Re: [regression, 3.0-rc1] dentry cache growth during unlinks, XFSperformance way down

From: Sage Weil
Date: Mon May 30 2011 - 00:19:30 EST


Hey Dave,

On Mon, 30 May 2011, Dave Chinner wrote:
> On Mon, May 30, 2011 at 12:06:04PM +1000, Dave Chinner wrote:
> > Folks,
> >
> > I just booted up a 3.0-rc1 kernel, and mounted an XFS filesystem
> > with 50M files in it. Running:
> >
> > $ for i in /mnt/scratch/*; do sudo /usr/bin/time rm -rf $i 2>&1 & done
> >
> > runs an 8-way parallel unlink on the files. Normally this runs at
> > around 80k unlinks/s, and it runs with about 500k-1m dentries and
> > inodes cached in the steady state.
> >
> > The steady state behaviour with 3.0-rc1 is that there are around 10m
> > cached dentries - all negative dentries - consuming about 1.6GB of
> > RAM (of 4GB total). Previous steady state was, IIRC, around 200MB of
> > dentries. My initial suspicions are that the dentry unhashing
> > changeÿÿ may be the cause of this...
>
> So a bisect lands on:
>
> $ git bisect good
> 79bf7c732b5ff75b96022ed9d29181afd3d2509c is the first bad commit
> commit 79bf7c732b5ff75b96022ed9d29181afd3d2509c
> Author: Sage Weil <sage@xxxxxxxxxxxx>
> Date: Tue May 24 13:06:06 2011 -0700
>
> vfs: push dentry_unhash on rmdir into file systems
>
> Only a few file systems need this. Start by pushing it down into each
> fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
> basis.
>
> This does not change behavior for any in-tree file systems.
>
> Acked-by: Christoph Hellwig <hch@xxxxxx>
> Signed-off-by: Sage Weil <sage@xxxxxxxxxxxx>
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>
> :040000 040000 c45d58718d33f7ca1da87f99fa538f65eaa3fe2c ec71cbecc59e8b142a7bfcabd469fa67486bef30 M fs
>
> Ok, so the question has to be asked - why wasn't dentry_unhash()
> pushed down into XFS?

Christoph asked me to leave it out to avoid the push-down + remove
noise. I missed it in v1, added it in v2, then took it out again.
Ultimately that isn't the real problem, though:

> Further, now that dentry_unhash() has been removed from most
> filesystems, what is replacing the shrink_dcache_parent() call that
> was cleaning up the "we can never reference again" child dentries of
> the unlinked directories? It appears that they are now being left in
> memory on the dentry LRU. It also appears that they have
> D_REFERENCED bit set, so they do not get immediately reclaimed by
> the shrinker.

Ah, yeah, that makes sense. I missed the shrink_dcache_parent side
effect. I suspect we just need something like the below? (Very lightly
tested!)

Thanks-
sage


From c1fac19b662b02ab4aea98ee2a8d0098bc985bc8 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@xxxxxxxxxxxx>
Date: Sun, 29 May 2011 20:35:44 -0700
Subject: [PATCH 1/3] vfs: shrink_dcache_parent before rmdir, dir rename

The dentry_unhash push-down series missed that shink_dcache_parent needs to
be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and
allow efficient dentry reclaim.

Reported-by: Dave Chinner <david@xxxxxxxxxxxxx>
Signed-off-by: Sage Weil <sage@xxxxxxxxxxxx>
---
fs/namei.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1ab641f..e2e4e8d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2579,6 +2579,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
if (error)
goto out;

+ shrink_dcache_parent(dentry);
error = dir->i_op->rmdir(dir, dentry);
if (error)
goto out;
@@ -2993,6 +2994,8 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
goto out;

+ if (target)
+ shrink_dcache_parent(new_dentry);
error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
if (error)
goto out;
--
1.7.1