Re: 2.6.6-rc{1,2} bad VM/NFS interaction in case of dirty pagewriteback

From: Andrew Morton
Date: Mon Apr 26 2004 - 21:17:04 EST


Shantanu Goel <sgoel01@xxxxxxxxx> wrote:
>
> Hi,
>
> During page reclamation when the scanner encounters a
> dirty page and invokes writepage(), if the FS layer
> returns WRITEPAGE_ACTIVATE as NFS does, I think the
> page should not be placed on the active as is
> presently done. This can cause a lot of extraneous
> swapout activity because in the presence of a large
> active list, the pages being written out will not be
> reclaimed quickly enough. It also seems counter
> intuitive since the scanner has just determined that
> the page has not been recently referenced.
>
> Shouldn't the following code from shrink_list():
>
> res = mapping->a_ops->writepage(page, &wbc);
> if (res < 0)
> handle_write_error(mapping, page, res);
> if (res == WRITEPAGE_ACTIVATE) {
> ClearPageReclaim(page);
> goto activate_locked;
> }
>
> read:
>
> res = mapping->a_ops->writepage(page, &wbc);
> if (res < 0)
> handle_write_error(mapping, page, res);
> if (res == WRITEPAGE_ACTIVATE) {
> ClearPageReclaim(page);
> goto keep_locked;
> }

WRITEPAGE_ACTIVATE is a bit of a hack to fix up specific peculiarities of
the interaction between tmpfs and page reclaim.

Trond, the changelog for that patch does not explain what is going on in
there - can you help out?

Also, what's the theory behind the handling of BDI_write_congested and
nfs_wait_on_write_congestion() in nfs_writepages()? From a quick peek it
looks like NFS should be waking the sleepers in blk_congestion_wait()
rather than doing it privately?

> I can observe the benefit of this change if I run a dd
> on an NFS mount with the active list full of mostly
> mapped pages. The stock kernel ends up paging out
> quite a bit of memory whereas the modified kernel does
> not.

yup. We should be able to handle the throttling and writeback scheduling
from within core VFS/VM. NFS should set and clear the backing_dev
congestion state appropriately and the VFS should take care of the rest.
The missing bit is the early blk_congestion_wait() termination.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/