mmap_sem - usage

Mark Hemment (
Fri, 16 Jan 1998 15:17:23 +0000 (GMT)


During a "normal" page fault, the faulting "mm" is locked by the
arch specific fault-handler taking the faulting task's mux (mmap_sem).
This is suppose to prevent a clone()d task from having the vma, which
covers the fauling address, being removed from under it. (ie. the other
clone() task calls calls munmap()). The arch specific fault handler calls
the generic handler (handle_mm_fault()), if necessary.

In several places in the generic mm code, the mmap_sem isn't taken
before messing with the vma-chain. For example, when splitting a vma due
to a change in access permissions (mprotect()) on part of the vma's
address-range. Therefore, the 'vma' which is past to handle_mm_fault()
may have its start/end/offset modified during a block (say, for page I/O).
This doesn't actually cause a serious problem, as the offset into a named
file is calculated via;
offset = (address&PAGE_MASK)-vma->vm_start+vma->vm_offset
and when splitting a vma the start/offset are kept 'in-sync' (as they
should be - at least I hope they are).

Another example is munmap(), where a vma can be released and mmap_sem
isn't taken.
I guess these, and other places (such as the code for mlock()) need to
take the semaphore. Easily fixed up.

Another inconsistency is in the calling of handle_mm_fault(). As I said
above, the arch specific fault-handler takes the semaphore before calling
this function, and releases it upon return. Great.
In other places where handle_mm_fault() is called, the semaphore is
_not_ taken. I believe it should be.

If the semaphore is always taken before calling handle_mm_fault(), then
it is possible to simplify some of the handling fault paths. This is
because it will be possible to guarantee, for example, that no other task
will swap-in a page and load the faulting PTE. So most the re-checks
in swap_in() become redundant (as do some of the checks in do_wp_page()
- even if the allocation blocks, getting write permissions enabled in
the PTE is impossible).
Of course, it is still possible a loaded PTE may be unloaded (by
swap_out()) but these cases are correctly covered.

Ideally, the mmap_sem should be fine grained to vma read/writer sleep
locks. But that is another discussion.

Any comments on mmap_sem?