Re: msync() behaviour broken for MS_ASYNC, revert patch?

From: Andrew Morton
Date: Thu Apr 01 2004 - 13:54:18 EST


"Stephen C. Tweedie" <sct@xxxxxxxxxx> wrote:
>
> > Tha advantage of the current MS_ASYNC is absolutely astoundingly HUGE:
> > because we don't wait for in-progress IO, it can be used to efficiently
> > synchronize multiple different areas, and then after that waiting for them
> > with _one_ single fsync().
>
> The Solaris one manages to preserve those properties while still
> scheduling the IO "soon". I'm not sure how we could do that in the
> current VFS, short of having a background thread scheduling deferred
> writepage()s as soon as the existing page becomes unlocked.

filemap_flush() will do exactly this. So if you want the Solaris
semantics, calling filemap_flush() intead of filemap_fdatawrite() should do
it.

> posix_fadvise() seems to do something a little like this already: the
> FADV_DONTNEED handler tries
>
> if (!bdi_write_congested(mapping->backing_dev_info))
> filemap_flush(mapping);
>
> before going into the invalidate_mapping_pages() call. Having that (a)
> limited to the specific file range passed into the fadvise(), and (b)
> available as a separate function independent of the DONTNEED page
> invalidator, would seem like an entirely sensible extension.
>
> The obvious implementations would be somewhat inefficient in some cases,
> though --- currently __filemap_fdatawrite simply list_splice()s the
> inode dirty list into the io list. Walking a long dirty list to flush
> just a few pages from a narrow range could get slow, and walking the
> radix tree would be inefficient if there are only a few dirty pages
> hidden in a large cache of clean pages.

The patches I have queued in -mm allow us to do this. We use
find_get_pages_tag() to iterate over only the dirty pages in the tree.

That still has the efficiency problem that when searching for dirty pages
we also visit pages which are both dirty and under writeback (we're not
interested in those pages if it is a non-blocking flush), although I've
only observed that to be a problem when the queue size was bumped up to
10,000 requests and I fixed that up for the common cases by other means.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/