Re: mmap over nfs leads to excessive system load

From: Trond Myklebust
Date: Wed Nov 16 2005 - 18:11:51 EST


On Wed, 2005-11-16 at 14:44 -0800, Andrew Morton wrote:

> Could peek at wbc->nr_pages, or add another boolean to writeback_control
> for this.
>
> diff -puN include/linux/writeback.h~writeback_control-flag-writepages include/linux/writeback.h
> --- devel/include/linux/writeback.h~writeback_control-flag-writepages 2005-11-16 14:43:52.000000000 -0800
> +++ devel-akpm/include/linux/writeback.h 2005-11-16 14:43:52.000000000 -0800
> @@ -53,10 +53,11 @@ struct writeback_control {
> loff_t start;
> loff_t end;
>
> - unsigned nonblocking:1; /* Don't get stuck on request queues */
> - unsigned encountered_congestion:1; /* An output: a queue is full */
> - unsigned for_kupdate:1; /* A kupdate writeback */
> - unsigned for_reclaim:1; /* Invoked from the page allocator */
> + unsigned nonblocking:1; /* Don't get stuck on request queues */
> + unsigned encountered_congestion:1; /* An output: a queue is full */
> + unsigned for_kupdate:1; /* A kupdate writeback */
> + unsigned for_reclaim:1; /* Invoked from the page allocator */
> + unsigned for_writepages:1; /* This is a writepages() call */
> };
>
> /*
> diff -puN mm/page-writeback.c~writeback_control-flag-writepages mm/page-writeback.c
> --- devel/mm/page-writeback.c~writeback_control-flag-writepages 2005-11-16 14:43:52.000000000 -0800
> +++ devel-akpm/mm/page-writeback.c 2005-11-16 14:43:52.000000000 -0800
> @@ -550,11 +550,17 @@ void __init page_writeback_init(void)
>
> int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
> {
> + int ret;
> +
> if (wbc->nr_to_write <= 0)
> return 0;
> + wbc->for_writepages = 1;
> if (mapping->a_ops->writepages)
> - return mapping->a_ops->writepages(mapping, wbc);
> - return generic_writepages(mapping, wbc);
> + ret = mapping->a_ops->writepages(mapping, wbc);
> + else
> + ret = generic_writepages(mapping, wbc);
> + wbc->for_writepages = 0;
> + return ret;
> }

That would work...

> > > For vmscan->writepage, wbc->for_reclaim is set, so we know that the IO
> > > should be pushed immediately. nfs_writepage() seems to dtrt here.
> > >
> > > With the proposed changes, we don't need that iput() in nfs_writepage().
> > > That worries me because I recall from a couple of years back that there are
> > > really subtle races with doing iput() on the vmscan->writepage() path.
> > > Cannot remember what they were though...
> >
> > Possibly to do with block filesystems that may trigger ->writepage()
> > while inside iput_final()? NFS can't do that.
>
> iput_final() can call truncate_inode_pages - maybe it was a deadlock, but
> I'm fairly sure it was a race.

Doesn't matter. There can be no dirty pages when NFS hits iput_final().
We make sure that we flush them into the filesystem accounting before we
release the file descriptor, then we make sure that we don't release the
dentry before the inode has been synced up.

Cheers,
Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/