Re: mmotm 2010-12-16 - breaks mlockall() call

From: Michel Lespinasse
Date: Tue Dec 21 2010 - 01:26:58 EST


On Sat, Dec 18, 2010 at 05:10:59AM -0500, Valdis.Kletnieks@xxxxxx wrote:
> On Thu, 16 Dec 2010 14:56:39 PST, akpm@xxxxxxxxxxxxxxxxxxxx said:
> > The mm-of-the-moment snapshot 2010-12-16-14-56 has been uploaded to
> >
> > http://userweb.kernel.org/~akpm/mmotm/
>
> The patch mlock-only-hold-mmap_sem-in-shared-mode-when-faulting-in-pages.patch
> causes this chunk of code from cryptsetup-luks to fail during the initramfs:
>
> if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
> log_err(ctx, _("WARNING!!! Possibly insecure memory. Are you root?\n"));
> _memlock_count--;
> return 0;
> }
>
> Bisection fingered this patch, which was added after -rc4-mmotm1202, which
> boots without tripping this log_err() call. I haven't tried building a
> -rc6-mmotm1216 with this patch reverted, because reverting it causes apply
> errors for subsequent patches.
>
> Ideas?

So I traced this down using valdis's initramfs image. This is actually
an interesting corner case:

Some VMA has the VM_MAY_(READ/WRITE/EXEC) flags, but is currently protected
with PROT_NONE permissions (VM_READ/WRITE_EXEC flags are all cleared).

When mlockall() is called, the old code would see mlock_fixup() return
an error for that VMA, which would be ignored by do_mlockall(). The new
code did not ignore errors from do_mlock_pages(), which broke backwards
compatibility.

So the trivial fix to make mlockall behave identically as before could be
as follows:

Signed-off-by: Michel Lespinasse <walken@xxxxxxxxxx>

diff --git a/mm/mlock.c b/mm/mlock.c
index db0ed84..168b750 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -424,7 +424,7 @@ static int do_mlock(unsigned long start, size_t len, int on)
return error;
}

-static int do_mlock_pages(unsigned long start, size_t len)
+static int do_mlock_pages(unsigned long start, size_t len, int ignore_errors)
{
struct mm_struct *mm = current->mm;
unsigned long end, nstart, nend;
@@ -465,6 +465,10 @@ static int do_mlock_pages(unsigned long start, size_t len)
*/
ret = __mlock_vma_pages_range(vma, nstart, nend, &locked);
if (ret < 0) {
+ if (ignore_errors) {
+ ret = 0;
+ continue; /* continue at next VMA */
+ }
ret = __mlock_posix_error_return(ret);
break;
}
@@ -502,7 +506,7 @@ SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len)
error = do_mlock(start, len, 1);
up_write(&current->mm->mmap_sem);
if (!error)
- error = do_mlock_pages(start, len);
+ error = do_mlock_pages(start, len, 0);
return error;
}

@@ -567,8 +571,10 @@ SYSCALL_DEFINE1(mlockall, int, flags)
capable(CAP_IPC_LOCK))
ret = do_mlockall(flags);
up_write(&current->mm->mmap_sem);
- if (!ret && (flags & MCL_CURRENT))
- ret = do_mlock_pages(0, TASK_SIZE);
+ if (!ret && (flags & MCL_CURRENT)) {
+ /* Ignore errors */
+ do_mlock_pages(0, TASK_SIZE, 1);
+ }
out:
return ret;
}

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/