Re: Question about do_wp_page() and lock_page()

From: Hugh Dickins
Date: Thu Jan 15 2009 - 08:25:33 EST


On Thu, 15 Jan 2009, Nick Piggin wrote:
> On Thursday 15 January 2009 08:35:08 Larry Woodman wrote:
> > Why is it safe for do_wp_page() to call cow_user_page() without locking
> > the old_page first? If a Direct IO read is outstanding on the old_page
> > or its the buffer to the file_read_actor() cant its contents change
> > during the COW fault? Is this not a problem?

A problem for the application if it isn't synchronizing itself adequately.

>
> Direct IO doesn't lock the pages either.
>
> There has simply never been any synchronisation here (by design: see
> MADV_DONTFORK/MADV_DOFORK/VM_DONTCOPY etc).
>
> This is one thing which Andrea had been trying to improve. If it can
> be done without introducing nasty overheads, then yes it would be
> nice to improve COW vs DIO semantics.

As far as Larry's question goes, direct IO is just a red herring.

If the page being COWed is a shared file page, another process could
have that page mapped shared writable, and be concurrently modifying
it - locking in do_wp_page won't protect against that at all.

And even if the page being COWed is an anonymous page, the normal
case is that the COW was done before starting direct IO or file
read into this page by this process.

When multithreaded there are indeed cases (fork) when the page
can get writeprotected and COW again, and that give surprises.

But your operative words are "There has simply never been any
synchronisation here (by design": the app would have to provide
the syncronisation it needs. The kernel just makes sure it does
not free any pages while they're still subject to being modified.

It's consistent with the inconsistent way in which writing to a
MAP_PRIVATE mapping of a file faults in snapshots of pages of the
file as userspace modifies them, but leaves the intervening untouched
pages showing more recent modifications placed there by others.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/