[RFC][PATCH] try not to let dirty inodes fester

From: Dave Hansen
Date: Fri Oct 01 2010 - 15:15:28 EST



I've got a bug that I've been investigating. The inode cache for a
certain fs grows and grows, desptite running

echo 2 > /proc/sys/vm/drop_caches

all the time. Not that running drop_caches is a good idea, but it
_should_ force things to stay under control. That is, unless the
inodes are dirty.

I think I'm seeing a case where the inode's dentry goes away, it
hits iput_final(). It is dirty, so it stays off the inode_unused
list waiting around for writeback.

Then, the periodic writeback happens, and we end up in
wb_writeback(). One of the first things we do in the loop (before
writing out inodes) is this:

if (work->for_background && !over_bground_thresh())
break;

over_bground_thresh() doesn't take dirty inodes into account. So
if we are in a situation where there are no dirty pages, we will
trip this, and break. If the system continues to dirty inodes
without dirtying any pages along the way, I don't think we will
ever do periodic writeback of the dirty inodes.

The attached patch moves the check down below some of the inode
writeback. It seems to do some good, but I'm worried that it
will cause additional I/O when we are below the writeback
thresholds.


---

linux-2.6.git-dave/fs/fs-writeback.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff -puN fs/fs-writeback.c~wb.diff fs/fs-writeback.c
--- linux-2.6.git/fs/fs-writeback.c~wb.diff 2010-10-01 12:12:11.000000000 -0700
+++ linux-2.6.git-dave/fs/fs-writeback.c 2010-10-01 12:12:11.000000000 -0700
@@ -625,12 +625,10 @@ static long wb_writeback(struct bdi_writ
break;

/*
- * For background writeout, stop when we are below the
- * background dirty threshold
+ * inodes are not accounted for in the background thresholds
+ * so we might leave too many of them dirty unless we do
+ * _some_ writeout without concern for over_bground_thresh()
*/
- if (work->for_background && !over_bground_thresh())
- break;
-
wbc.more_io = 0;
wbc.nr_to_write = MAX_WRITEBACK_PAGES;
wbc.pages_skipped = 0;
@@ -646,6 +644,13 @@ static long wb_writeback(struct bdi_writ
wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;

/*
+ * For background writeout, stop when we are below the
+ * background dirty threshold
+ */
+ if (work->for_background && !over_bground_thresh())
+ break;
+
+ /*
* If we consumed everything, see if we have more
*/
if (wbc.nr_to_write <= 0)
diff -puN MAINTAINERS~wb.diff MAINTAINERS
_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/