Re: Linux 4.12-rc6

From: Linus Torvalds
Date: Mon Jun 19 2017 - 22:32:31 EST


On Tue, Jun 20, 2017 at 8:26 AM, Dave Jones <davej@xxxxxxxxxxxxxxxxx> wrote:
> > Hugh Dickins (1):
> > mm: larger stack guard gap, between vmas
>
> This seems to be buggered.
>
> 002331 00000396712307 0 2 kernel BUG at mm/mmap.c:1963!
> 002332 00000396712414 0 4 invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> 002333 00000396712541 0 4 CPU: 0 PID: 4572 Comm: trinity-c41 Not tainted 4.12.0-rc6-think+ #1
> 002336 00000396712959 0 4 RIP: 0010:unmapped_area_topdown+0xa5/0x170

Dave, do you have instructions for Hugh to recreate that with trinity
(or perhaps some way to generate a test-case from trinity?). Or does
it trigger easily by just running trinity?

I'm in China right now, and will be traveling again this afternoon, so
I probably can't look at it myself until later, but hopefully Hugh has
the cycles to follow up in it..

Hugh? The changes to unmapped_area_topdown() look trivial, but
obviously there's something wrong there. The code decodes to

49 39 c0 cmp %rax,%r8
76 d0 jbe 0xfffffffffffffffb
* 0f 0b ud2 <-- trapping instruction

so from the

VM_BUG_ON(gap_end < gap_start);

we have gap_start/end in %r8 and %rax respectively, which are:

R08: 00007f7d54673000
RAX: 00007f7d543d6000

so yes, gap_start is bigger than gap_end there by quite a degree (more
than the 1MB of the gap size unless I looked at it wrong).

Hmm. Maybe it's this:

/* Check if current node has a suitable gap */
gap_end = vm_start_gap(vma);
if (gap_end < low_limit)
return -ENOMEM;
if (gap_start <= high_limit && gap_end - gap_start >= length)
goto found;

where it used to be that gap_end was guaranteed to be after gap_start,
but that's no longer true. We have

gap_start = vma->vm_prev ? vm_end_gap(vma->vm_prev) : 0;
gap_end = vm_start_gap(vma);

and by using MAP_FIXED, you can end up in the situation that
"vma->vm_prev" is closer to vma than the gap size.

So now gap_end - gap_start will underflow, and then the logic that
does "goto found" thinks it found a hole that is larger than
"length", when in actual fact it found a "negative-size" hole.

So maybe that "goto found" condition should have an additional test
for "gap_end > gap_start"?

Or maybe I'm just hallucinating and missed something. Hugh, Oleg,
Michal, can you take another look and double-check this logic?

Linus