Re: mmap over nfs leads to excessive system load

From: Trond Myklebust
Date: Wed Nov 16 2005 - 09:03:29 EST


On Tue, 2005-11-15 at 23:45 -0800, Andrew Morton wrote:

> So filemap_write_and_wait() has to write 2MB's worth of pages. Problem is,
> _all_ the pages, even the 99% which are clean are tagged as dirty in the
> pagecache radix tree. So find_get_pages_tag() ends up visiting each page
> in the file, and blows much CPU doing so.
>
> The writeout happens in mpage_writepages(), which uses
> clear_page_dirty_for_io() to clear PG_dirty. But it doesn't clear the
> dirty tag in the radix tree. It relies upon the filesystem to do the right
> thing later on. Which is all very unpleasant, sorry. See the explanatory
> comment over clear_page_dirty_for_io().

> nfs_writepage() doesn't do any of the things which that comment says it
> should, hence the radix tree tags are getting out of sync, hence this
> problem.
>
> NFS does strange, incomprehensible-to-little-akpms things in its writeout
> path. Ideally, it should run set_page_writeback() prior to unlocking the
> page and end_page_writeback() when I/O completes. That'll keep the VM
> happier while fixing this performance glitch.

Actually that will screw over performance even further by forcing us to
send out loads of little RPC requests to write 4k pages instead of
allowing us to gather those writes into 32k (or larger) chunks.

Anyhow, does the following patch help?

Cheers,
Trond
------
NFS: resync to yet more writepage() changes...

Ensure that we call clear_page_dirty() for pages that have been written
via writepage().

Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
---

fs/nfs/write.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8f71e76..ea77da5 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -213,6 +213,7 @@ static int nfs_writepage_sync(struct nfs
} while (count);
/* Update file length */
nfs_grow_file(page, offset, written);
+ clear_page_dirty(page);
/* Set the PG_uptodate flag? */
nfs_mark_uptodate(page, offset, written);

@@ -238,6 +239,7 @@ static int nfs_writepage_async(struct nf
goto out;
/* Update file length */
nfs_grow_file(page, offset, count);
+ clear_page_dirty(page);
/* Set the PG_uptodate flag? */
nfs_mark_uptodate(page, offset, count);
nfs_unlock_request(req);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/