Re: Alternative msync() fix for 2.6.18?
From: Martin Michlmayr
Date: Fri Dec 29 2006 - 09:02:15 EST
CCing linux-kernel since Hugh's patch might be of interest to
non-Debian people too. This message concerns a problem with msync()
in 2.6.17 and 2.6.18 that can be found with the LSB test suite.
* Hugh Dickins <hugh@xxxxxxxxxxx> [2006-12-27 01:04]:
> On Tue, 26 Dec 2006, Hugh Dickins wrote:
> > On Tue, 26 Dec 2006, Martin Michlmayr wrote:
> > >
> > > The first message at http://bugs.debian.org/394392 contains some
> > > information about it, but I'm sure Jeff Licquia (CCed) can provide
> > > more information if necessary.
> >
> > I've given that a quick look, and I'll bet it's something very
> > easily and safely fixed by a much simpler patch, to mm/msync.c alone,
> > than by all the dirty pages rework and attendant unresolved problems.
>
> How quickly I forget! It wasn't until I looked into the git history
> that I remembered how I'd discovered this already, just before 2.6.17
> went out; but thought myself likely to introduce a more serious bug
> if I tried to fix it in a hurry. (Then forgot to send in a fix for
> 2.6.18, expecting Linus to put in Peter's more significant mods.)
>
> So beware of the fix below, which I'll confess to not even having
> tested in practice: please review carefully, it's a confusing loop
> (rather like the mincore one which Linus has just cleaned up in
> 2.6.20-rc); and it took me a while to decide which of our many
> versions of that loop to start from in fixing it - chose to
> rely on what I'd worked out earlier when redoing it for 2.6.19.
>
> Material for -stable? Well, my judgement was, and is, not really:
> the 2.6.17/2.6.18 behaviour isn't quite right, but though it doesn't
> match the man page and that testsuite, what it does isn't actually
> harmful at all. If this patch is more acceptable to Debian if it
> does appear in -stable, I can try sending it to Chris and Greg:
> but I won't be able to argue for it very forcefully.
I think it would make sense to have it in an -stable update because
it's a regression from 2.6.16 after all and it means that 2.6.17 and
2.6.18 are not LSB 3.1 compliant. However, I don't have a strong
opinion about it and I'm not sure another update for 2.6.18 is
planned.
> Maybe better for you to consider it as just a subpatch of what went
> into 2.6.19, which I was wrong to have mixed in with Peter's work.
>
> I certainly think that this patch below (if it satisfies your review
> and testing) is much more suitable for Debian's 2.6.18 than Peter's
> suite of patches. Not to put those down at all: the reverse, they're
> a significant contribution to 2.6.19, which just wouldn't be expected
> in any 2.6.18 kernel (and sadly, as you've found, are exposing still
> unexplained problems there). Anyway, the patch and its comment...
Yes, I agree. I'm CCing the linux-mm list in hope that someone can
review your patch. In the meantime, I've asked the Debian LSB folks to
verify that your patch fixes the LSB problem.
From: Hugh Dickins <hugh@xxxxxxxxxxx>
Fix 2.6.18 (or 2.6.17) sys_msync to report -ENOMEM as before when an
unmapped area falls within its range, and not to overshoot: to satisfy
LSB 3.1 tests and to fix Debian Bug#394392. Took the 2.6.19 sys_msync
as starting point (including its cleanup of repeated "current->mm"s),
reintroducing the msync_interval and balance_dirty_pages_ratelimited_nr
previously needed.
Signed-off-by: Hugh Dickins <hugh@xxxxxxxxxxx>
---
mm/msync.c | 66 +++++++++++++++++++++++++++----------------------------------
1 file changed, 30 insertions(+), 36 deletions(-)
--- 2.6.18/mm/msync.c 2006-09-20 04:42:06.000000000 +0100
+++ linux/mm/msync.c 2006-12-26 23:52:58.000000000 +0000
@@ -146,10 +146,10 @@ static int msync_interval(struct vm_area
asmlinkage long sys_msync(unsigned long start, size_t len, int flags)
{
unsigned long end;
+ struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
int unmapped_error = 0;
int error = -EINVAL;
- int done = 0;
if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC))
goto out;
@@ -169,64 +169,58 @@ asmlinkage long sys_msync(unsigned long
* If the interval [start,end) covers some unmapped address ranges,
* just ignore them, but return -ENOMEM at the end.
*/
- down_read(¤t->mm->mmap_sem);
- vma = find_vma(current->mm, start);
- if (!vma) {
- error = -ENOMEM;
- goto out_unlock;
- }
- do {
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, start);
+ for (;;) {
unsigned long nr_pages_dirtied = 0;
struct file *file;
+ /* Still start < end. */
+ error = -ENOMEM;
+ if (!vma)
+ goto out_unlock;
/* Here start < vma->vm_end. */
if (start < vma->vm_start) {
- unmapped_error = -ENOMEM;
start = vma->vm_start;
- }
- /* Here vma->vm_start <= start < vma->vm_end. */
- if (end <= vma->vm_end) {
- if (start < end) {
- error = msync_interval(vma, start, end, flags,
- &nr_pages_dirtied);
- if (error)
- goto out_unlock;
- }
- error = unmapped_error;
- done = 1;
- } else {
- /* Here vma->vm_start <= start < vma->vm_end < end. */
- error = msync_interval(vma, start, vma->vm_end, flags,
- &nr_pages_dirtied);
- if (error)
+ if (start >= end)
goto out_unlock;
+ unmapped_error = -ENOMEM;
}
+ /* Here vma->vm_start <= start < vma->vm_end. */
+ error = msync_interval(vma, start, min(end, vma->vm_end),
+ flags, &nr_pages_dirtied);
+ if (error)
+ goto out_unlock;
file = vma->vm_file;
start = vma->vm_end;
if ((flags & MS_ASYNC) && file && nr_pages_dirtied) {
get_file(file);
- up_read(¤t->mm->mmap_sem);
+ up_read(&mm->mmap_sem);
balance_dirty_pages_ratelimited_nr(file->f_mapping,
nr_pages_dirtied);
fput(file);
- down_read(¤t->mm->mmap_sem);
- vma = find_vma(current->mm, start);
+ if (start >= end)
+ goto out;
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, start);
} else if ((flags & MS_SYNC) && file &&
(vma->vm_flags & VM_SHARED)) {
get_file(file);
- up_read(¤t->mm->mmap_sem);
+ up_read(&mm->mmap_sem);
error = do_fsync(file, 0);
fput(file);
- down_read(¤t->mm->mmap_sem);
- if (error)
- goto out_unlock;
- vma = find_vma(current->mm, start);
+ if (error || start >= end)
+ goto out;
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, start);
} else {
+ if (start >= end)
+ goto out_unlock;
vma = vma->vm_next;
}
- } while (vma && !done);
+ }
out_unlock:
- up_read(¤t->mm->mmap_sem);
+ up_read(&mm->mmap_sem);
out:
- return error;
+ return error ? : unmapped_error;
}
--
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/