On Sun, Aug 19, 2001 at 04:59:06AM +0200, Andrea Arcangeli wrote:
> @@ -587,7 +591,21 @@
> return -ENOMEM;
> spin_lock(&vma->vm_mm->page_table_lock);
> vma->vm_start = address;
> +
> + /*
> + * vm_pgoff locking is a bit subtle: everybody but expand_stack is
> + * playing with the vm_pgoff with the write semaphore acquired. The
> + * only one playing with vm_pgoff with only the read semaphore
> + * acquired is expand_stack and it serializes against itself with the
> + * spinlock.
> + *
> + * More in general this means that it is not enough to grab the mmap_sem
> + * in read mode to avoid vm_pgoff to change under you. You either
> + * need the write semaphore acquired, or the read semaphore plus
> + * the spinlock.
> + */
> vma->vm_pgoff -= grow;
> +
> vma->vm_mm->total_vm += grow;
> if (vma->vm_flags & VM_LOCKED)
> vma->vm_mm->locked_vm += grow;
unfortunately I was way too optimistic about this and I also misread
part of the code while writing the above. Looking more closely
expand_stack is a race condition in itself.
Nobody is allowed to change vm_pgoff or vm_start without holding _both_
the mm sem in _write_ mode _and_ the spinlock.
expand_stack holds the mm sem in _read_ mode and the spinlock so it is
totally broken.
All the readers thinks that only holding only the read semaphore is
enough to get coherent data but expand_stack is breaking this rule and
so all the readers can race.
To fix this problem we simply need to convert all the callers of
expand_stack to hold the write semaphore instead of the read semaphore
(this will have to be propagated to all architectures). I just checked
all the callers and they're all convertible without any real pain (we
just need to do a second lookup after upgrading the lock because we
don't have a primitive to upgrade the lock from "read" to "write"
atomically without having to release it for some time in the middle, but
expand_stack is a slow path so it's not a showstopper).
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Aug 23 2001 - 21:00:29 EST