Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY

From: Mike Waychison
Date: Sun Nov 30 2008 - 23:52:19 EST


Török Edwin wrote:
On 2008-11-29 01:02, Mike Waychison wrote:
Nick Piggin wrote:
On Thu, Nov 27, 2008 at 11:03:40AM -0800, Mike Waychison wrote:
Nick Piggin wrote:
On Thu, Nov 27, 2008 at 01:28:41AM -0800, Mike Waychison wrote:
Török however identified mmap taking on the order of several
milliseconds due to this exact problem:

http://lkml.org/lkml/2008/9/12/185
Turns out to be a different problem.

What do you mean?
His is just contending on the write side. The retry patch doesn't help.

I disagree. How do you get 'write contention' from the following
paragraph:

"Just to confirm that the problem is with pagefaults and mmap, I dropped
the mmap_sem in filemap_fault, and then
I got same performance in my testprogram for mmap and read. Of course
this is totally unsafe, because the mapping could change at any time."

It reads to me that the writers were held off by the readers sleeping
in IO.

It is true that I have a write/write contention too, but do_page_fault
shows up too on lock_stat.

This is my guess at what happens:
* filemap_fault used to sleep with mmap_sem held while waiting for the
page lock.
* the google patch avoids that, which is fine: if page lock can't be
taken, it drops mmap_sem, waits, then retries the fault once
* however after we acquired the page lock, mapping->a_ops->readpage is
invoked, mmap_sem is NOT dropped here:

error = mapping->a_ops->readpage(file, page);
if (!error) {
wait_on_page_locked(page);

If my understanding is correct ->readpage does the actual disk I/O, and
it keeps the page locked, when the lock is released we know it has finished.
So wait_on_page_locked(page) holds mmap_sem locked for read during the
disk I/O, preventing sys_mmap/sys_munmap from making progress.

I don't know how to prove/disprove my guess above, suggestions welcome.

Could the patch be changed to also release the mmap_sem after readpage,
and before wait_on_page_locked?

Ya, my suspicion is that there is still some other code path where we are waiting on the locked page with mmap_sem still held. Ying and I will take a closer look this week.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/