Re: mmap over nfs leads to excessive system load

From: Andrew Morton
Date: Wed Nov 16 2005 - 17:44:57 EST


Trond Myklebust <trond.myklebust@xxxxxxxxxx> wrote:
>
> On Wed, 2005-11-16 at 14:10 -0800, Andrew Morton wrote:
>
> > > How is the filesystem supposed to distinguish between the cases
> > > "VM->writepage()", and "VM->writepages->mpage_writepages->writepage()"?
> > >
> >
> > Via the writeback_control, hopefully.
> >
> > For write_one_page(), sync_mode==WB_SYNC_ALL, so NFS should start the I/O
> > immediately (it appears to not do so).
>
> Sorry, but so does filemap_fdatawrite(). WB_SYNC_ALL clearly does not
> discriminate between a writepages() and a single writepage() situation,
> whatever the original intention was.

Could peek at wbc->nr_pages, or add another boolean to writeback_control
for this.

diff -puN include/linux/writeback.h~writeback_control-flag-writepages include/linux/writeback.h
--- devel/include/linux/writeback.h~writeback_control-flag-writepages 2005-11-16 14:43:52.000000000 -0800
+++ devel-akpm/include/linux/writeback.h 2005-11-16 14:43:52.000000000 -0800
@@ -53,10 +53,11 @@ struct writeback_control {
loff_t start;
loff_t end;

- unsigned nonblocking:1; /* Don't get stuck on request queues */
- unsigned encountered_congestion:1; /* An output: a queue is full */
- unsigned for_kupdate:1; /* A kupdate writeback */
- unsigned for_reclaim:1; /* Invoked from the page allocator */
+ unsigned nonblocking:1; /* Don't get stuck on request queues */
+ unsigned encountered_congestion:1; /* An output: a queue is full */
+ unsigned for_kupdate:1; /* A kupdate writeback */
+ unsigned for_reclaim:1; /* Invoked from the page allocator */
+ unsigned for_writepages:1; /* This is a writepages() call */
};

/*
diff -puN mm/page-writeback.c~writeback_control-flag-writepages mm/page-writeback.c
--- devel/mm/page-writeback.c~writeback_control-flag-writepages 2005-11-16 14:43:52.000000000 -0800
+++ devel-akpm/mm/page-writeback.c 2005-11-16 14:43:52.000000000 -0800
@@ -550,11 +550,17 @@ void __init page_writeback_init(void)

int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
{
+ int ret;
+
if (wbc->nr_to_write <= 0)
return 0;
+ wbc->for_writepages = 1;
if (mapping->a_ops->writepages)
- return mapping->a_ops->writepages(mapping, wbc);
- return generic_writepages(mapping, wbc);
+ ret = mapping->a_ops->writepages(mapping, wbc);
+ else
+ ret = generic_writepages(mapping, wbc);
+ wbc->for_writepages = 0;
+ return ret;
}

/**
_

> > For vmscan->writepage, wbc->for_reclaim is set, so we know that the IO
> > should be pushed immediately. nfs_writepage() seems to dtrt here.
> >
> > With the proposed changes, we don't need that iput() in nfs_writepage().
> > That worries me because I recall from a couple of years back that there are
> > really subtle races with doing iput() on the vmscan->writepage() path.
> > Cannot remember what they were though...
>
> Possibly to do with block filesystems that may trigger ->writepage()
> while inside iput_final()? NFS can't do that.

iput_final() can call truncate_inode_pages - maybe it was a deadlock, but
I'm fairly sure it was a race.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/