On 2008-11-29 01:02, Mike Waychison wrote:Nick Piggin wrote:On Thu, Nov 27, 2008 at 11:03:40AM -0800, Mike Waychison wrote:I disagree. How do you get 'write contention' from the followingNick Piggin wrote:His is just contending on the write side. The retry patch doesn't help.On Thu, Nov 27, 2008 at 01:28:41AM -0800, Mike Waychison wrote:What do you mean?Török however identified mmap taking on the order of severalTurns out to be a different problem.
milliseconds due to this exact problem:
http://lkml.org/lkml/2008/9/12/185
paragraph:
"Just to confirm that the problem is with pagefaults and mmap, I dropped
the mmap_sem in filemap_fault, and then
I got same performance in my testprogram for mmap and read. Of course
this is totally unsafe, because the mapping could change at any time."
It reads to me that the writers were held off by the readers sleeping
in IO.
It is true that I have a write/write contention too, but do_page_fault
shows up too on lock_stat.
This is my guess at what happens:
* filemap_fault used to sleep with mmap_sem held while waiting for the
page lock.
* the google patch avoids that, which is fine: if page lock can't be
taken, it drops mmap_sem, waits, then retries the fault once
* however after we acquired the page lock, mapping->a_ops->readpage is
invoked, mmap_sem is NOT dropped here:
error = mapping->a_ops->readpage(file, page);
if (!error) {
wait_on_page_locked(page);
If my understanding is correct ->readpage does the actual disk I/O, and
it keeps the page locked, when the lock is released we know it has finished.
So wait_on_page_locked(page) holds mmap_sem locked for read during the
disk I/O, preventing sys_mmap/sys_munmap from making progress.
I don't know how to prove/disprove my guess above, suggestions welcome.
Could the patch be changed to also release the mmap_sem after readpage,
and before wait_on_page_locked?