Re: mmap over nfs leads to excessive system load
From: Andrew Morton
Date: Wed Nov 16 2005 - 17:10:57 EST
Trond Myklebust <trond.myklebust@xxxxxxxxxx> wrote:
>
> On Wed, 2005-11-16 at 13:31 -0800, Andrew Morton wrote:
> > But block-backed filesytems have the same concern: we don't want to do a
> > whole bunch of 4k I/Os. Hence the writepages() interface, which is the
> > appropriate place to be building up these large I/Os.
> >
> > NFS does nfw_writepages->mpage_writepages->nfs_writepage and to build the
> > large I/Os it leaves the I/O pending on return from nfs_writepage(). It
> > appears to flush any pending pages on the exit path from nfs_writepages().
> >
> > If that's a correct reading then there doesn't appear to be any way in
> > which there's dangling I/O left to do after nfs_writepages() completes.
>
> Agreed. AFAICS, nfs_writepages should be quite OK, however writepage()
> on its own _is_ problematic.
>
> Look at the usage in write_one_page(), which calls directly down to
> ->writepage(), and then immediately does a wait_on_page_writeback().
>
> How is the filesystem supposed to distinguish between the cases
> "VM->writepage()", and "VM->writepages->mpage_writepages->writepage()"?
>
Via the writeback_control, hopefully.
For write_one_page(), sync_mode==WB_SYNC_ALL, so NFS should start the I/O
immediately (it appears to not do so).
For vmscan->writepage, wbc->for_reclaim is set, so we know that the IO
should be pushed immediately. nfs_writepage() seems to dtrt here.
With the proposed changes, we don't need that iput() in nfs_writepage().
That worries me because I recall from a couple of years back that there are
really subtle races with doing iput() on the vmscan->writepage() path.
Cannot remember what they were though...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/