[PATCH] Re: loops in get_user_pages() for VM_IO

From: Hugh Dickins
Date: Thu Nov 18 2004 - 15:05:24 EST


On Wed, 17 Nov 2004, Hugh Dickins wrote:
>
> I love the void make_pages_present(): at present it returns 0,
> error code or -1 - was that really supposed to mean -EPERM?
> What happens if someone tries to make_pages_present() more than
> is physically available? I think get_user_pages() returns -ENOMEM,
> the intervening levels ignore that, the process then gets OOM-killed.
> If that's acceptable, then I'd think just go with the simple patch,
> void make_pages_present(), for 2.6.10.

Well, no. wli specifically added mlock's check for get_user_pages
error in 2.6.0-test6; and I'd be reluctant to hide the -EFAULT from
trying to lock down pages beyond end of file. So here's the fairly
minimal patch I now suggest for 2.6.10....


Whereas get_user_pages simply fails (or stops early) on a VM_IO area
when filling a page vector, it currently proceeds to the follow_page/
handle_mm_fault loop when not passed a vector (by make_pages_present):
which may loop forever with follow_page failing on out-of-range pfn
and handle_mm_fault succeeding on pte already present.

This would already have been a problem for mlock, but I've made it worse
in 2.4.10-rc by updating vm_flags from driver's vma->vm_flags at the end
of do_mmap_pgoff (to avoid possible vma mismerge): if the driver sets
VM_LOCKED|VM_IO (not a recommended combination, but still done), then
make_pages_present is now called and a hang results.

We've removed VM_LOCKED from some of the drivers in question. But it's
surely unacceptable for an mlockall to hang or fail just because there
happens to be a VM_IO area somewhere in the address space.

We can argue whether VM_IO areas should be counted in locked_vm, but
go with a minimal fix for now: get_user_pages fail or stop on VM_IO
whether or not it's filling a page vector; but mlock_fixup (the only
place which checks for error from make_pages_present) skip VM_IO.

Signed-off-by: Hugh Dickins <hugh@xxxxxxxxxxx>

--- 2.6.10-bk3/mm/memory.c 2004-11-17 14:47:01.000000000 +0000
+++ linux/mm/memory.c 2004-11-18 18:00:04.204306016 +0000
@@ -761,7 +761,7 @@ int get_user_pages(struct task_struct *t
continue;
}

- if (!vma || (pages && (vma->vm_flags & VM_IO))
+ if (!vma || (vma->vm_flags & VM_IO)
|| !(flags & vma->vm_flags))
return i ? : -EFAULT;

--- 2.6.10-bk3/mm/mlock.c 2004-11-15 16:21:24.000000000 +0000
+++ linux/mm/mlock.c 2004-11-18 18:12:03.688927728 +0000
@@ -47,7 +47,8 @@ static int mlock_fixup(struct vm_area_st
pages = (end - start) >> PAGE_SHIFT;
if (newflags & VM_LOCKED) {
pages = -pages;
- ret = make_pages_present(start, end);
+ if (!(newflags & VM_IO))
+ ret = make_pages_present(start, end);
}

vma->vm_mm->locked_vm -= pages;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/