Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held

From: Andy Lutomirski
Date: Fri Jan 04 2013 - 13:16:35 EST

On Thu, Dec 20, 2012 at 4:49 PM, Michel Lespinasse <walken@xxxxxxxxxx> wrote:
> We have many vma manipulation functions that are fast in the typical case,
> but can optionally be instructed to populate an unbounded number of ptes
> within the region they work on:
> - mmap with MAP_POPULATE or MAP_LOCKED flags;
> - remap_file_pages() with MAP_NONBLOCK not set or when working on a
> VM_LOCKED vma;
> - mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in effect;
> - brk() when mlock(MCL_FUTURE) is in effect.
> Current code handles these pte operations locally, while the sourrounding
> code has to hold the mmap_sem write side since it's manipulating vmas.
> This means we're doing an unbounded amount of pte population work with
> mmap_sem held, and this causes problems as Andy Lutomirski reported
> (we've hit this at Google as well, though it's not entirely clear why
> people keep trying to use mlock(MCL_FUTURE) in the first place).
> I propose introducing a new mm_populate() function to do this pte
> population work after the mmap_sem has been released. mm_populate()
> does need to acquire the mmap_sem read side, but critically, it
> doesn't need to hold continuously for the entire duration of the
> operation - it can drop it whenever things take too long (such as when
> hitting disk for a file read) and re-acquire it later on.

I still have quite a few instances of 2-6 ms of latency due to
"call_rwsem_down_read_failed __do_page_fault do_page_fault
page_fault". Any idea why? I don't know any great way to figure out
who is holding mmap_sem at the time. Given what my code is doing, I
suspect the contention is due to mmap or munmap on a file. MCL_FUTURE
is set, and MAP_POPULATE is not set.

It could be the other thread calling mmap and getting preempted (or
otherwise calling schedule()). Grr.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at